hp是什么意思
Method and system for virtual intelligence user interaction Download PDFInfo
- Publication number
- WO2023137078A1 WO2023137078A1 PCT/US2023/010624 US2023010624W WO2023137078A1 WO 2023137078 A1 WO2023137078 A1 WO 2023137078A1 US 2023010624 W US2023010624 W US 2023010624W WO 2023137078 A1 WO2023137078 A1 WO 2023137078A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- virtual
- request
- personification
- question
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/01—Indexing scheme relating to G06F3/01
- G06F2203/011—Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0381—Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
Definitions
- the present invention is directed to a method and system to provide user interaction with virtual personifications using artificial intelligence (“Al”).
- Al artificial intelligence
- a system and method to generate and update virtual personification using artificial intelligence receiving data associated with a person, the data comprising one or more of the following: text files, audio files, image files, and video files, and rendering a virtual personification of the person and outputting the virtual personification to a user. Then, receiving and interpreting a user input to generate a user request and updating the virtual personification in response to the user request.
- the update comprising one or more of the following.
- the virtual personification is of a person, either living or deceased. It is contemplated that the virtual personification may comprise an audio output and video output which are presented in a virtual environment of a type associated with the virtual personification. The virtual personification may comprise a representation of a non-living item.
- the method is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
- the remote artificial intelligence module may be a computing device with a processor and memory storing machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification.
- Upon locating an answer to the question or request generating data that represents the virtual personification answering the question or request and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
- the method may further comprise tracking a user’s hand position using one or more user hand position tracking devices to determine what the user is pointing at in the virtual environment.
- the step of generating a response to the question or request may use artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which to not provide a direct answer to the question or request.
- a system for presenting an interactive, artificial intelligence assisted, virtual personification to a user comprising a virtual reality device configured to have at least a portion be worn by the user.
- the virtual reality device includes a wearable screen configured for viewing by a user, one or more speakers configured to provide audio output to the user, a microphone configured to receive audio input from the user, and one or more external input devices configured to receive input from the user.
- Also part of the virtual reality device includes a communication module configured to communicate over a computer network or Internet, and a processor with access to a memory. The processor executes machine readable code and the memory is configured to store the machine readable code.
- the machine readable code is configured to present a virtual environment on the wearable screen and through the one or more speakers to the user and present, to the user on the wearable screen and through the one or more speakers, a virtual personification of a person currently living or deceased, in the virtual environment.
- the code is also configured to receive a question or request from the user regarding one or more aspects of the virtual environment or the virtual personification and then generate a response to the question or request from the user, which includes generating video content and audio content which did not previously exist.
- the code then presents the generated response to the user on the wearable screen and through the one or more speakers in response to question or request from the user.
- the machine readable code is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
- the remote artificial intelligence module may be a computing device with memory and processor such that memory store machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification. Then, upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request, and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
- the system may further comprise one or more user hand position tracking device configured to track a position of a user’s hand to determine what the user is pointing at in the virtual environment.
- the input from the user comprises an audio input or an input to the one or more external input devices. It is contemplated that generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both, of the person represented by the virtual personification, to form the video content and audio content which did not previously exist.
- the generated response to the question or request uses artificial intelligence to generate an answer by searching one or more databases that contain information from a person represented by the virtual personification but which to not provide a direct answer to the question or request.
- Also disclosed herein is a method for presenting an interactive experience with a virtual personification using a screen, speakers, and microphone of a user computing device.
- the method comprises a virtual environment on the wearable screen and through the one or more speakers to the user and present the virtual personification in the virtual environment.
- receiving input from the user comprising a question, a user request, or subject regarding one or more aspects of the virtual environment, the virtual personification, or the actions of the virtual personification in the virtual environment.
- This method then sends a request for a response to the input from the user to an Al computing device that is remote from the user computing device, and with the Al computing device, create a response based on pre-existing content stored in one or more databases which is processed to create the generated response.
- the Al computing device is a computing device with memory and processor such that the memory stores machine readable code configured to receive the input from the user computing device, process the input from the user to derive a meaning, and based on the meaning, perform one or more searches for answers to the input from the user in databases unrelated to the virtual personification. Upon locating an response to the input from the user, generate data that represents the virtual personification answering the question or request, and transmit the data, that represents the virtual personification responding to the input from the user, to the user computing device.
- This method may further include monitoring one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment and interpreting the pointing as the input from the user. It is contemplated that the input from the user comprises an audio input or an input from the user to the one or more external input devices.
- the step of generating video content and audio content which did not previously exist occurs by processing existing video, audio, or both of a person represented by the virtual personification to generate new content.
- Figure 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system.
- Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al operating on a separate user device such as a smartphone, a tablet, a personal computer, etc.
- Figure 2 illustrates an exemplary environment of use of the virtual personification Al system.
- Figure 3 illustrates a block diagram of an example embodiment of a computing device, also referred to as a user device which may or may not be mobile.
- FIG. 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment.
- Al services Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
- a device is any element running with a minimum of a CPU or a system which is used to interface with a device.
- an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
- An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
- Al services Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
- a device is any element running with a minimum of a CPU or a system which is used to interface with a device.
- an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
- An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
- a virtual personification system may analyze pre-recorded data to generate dynamic responses to user requests/questions through virtual personifications.
- the virtual personification may be a virtual representation which may be based on a real person. For example, the user, a family member or relative, a famous person, a historical figure, or any other type person.
- the virtual representation may also be a user or computer created person that does not represent a real person.
- Pre-recorded data may include image, video, or audio footage of the real person (such as YouTube and other film footage). Dynamic responses are generated to user requests/questions related to that known person, even though prerecorded data may not include any adequate responses or responses which will match the question.
- a user may wish to be provided with a recipe from a famous chef, such as Gordon Ramsey, to make grilled salmon.
- the virtual personification may analyze Gordon Ramsey’s footage on making grilled chicken and grilled potatoes to generate a virtual personification of Gordon Ramsey guiding the user through the process of making grilled salmon, as if Gordon Ramsey were in a cooking show and personally providing detailed instructions to the specific user request.
- the system Al can pull details from prior recordings and manipulate the visual and audio files to create a new virtual representation that is directly and accurately responsive to the user’s request. Al may generate new information, such as how to adjust the response to be responsive to the specific user request.
- Al can understand the user’s request, analyze the information already provided by the chef about how to cook chicken, realize that chicken is not salmon, and then search for a recipe for salmon by the same chef or recipe, and then process the new recipe and the virtual representation to present the new recipe to the user of the system using the virtual representation, as if the original chef was actually providing the recipe for salmon and not chicken.
- This example may be applied to any other topic or environment of use.
- the virtual personification of Gordon Ramsey may use a voice that sounds like Gordon Ramsey, may be dressed like Gordon Ramsey, as he typically appears on cooking shows, and may mimic Gordon Ramsey’s body language and speech pattern.
- Al may be used to create the virtual personification even in situations when the actual person never actually provided a responsive answer in a video or audio recording.
- the virtual representation may be created using built-in Al modules such as a virtual personification rendering module (discussed in more details below) or using third-party tools, which the virtual personification system may interface with.
- the user may attempt Gordon Ramsey’s recipe of scrambled eggs, which may already be available on YouTube, and which may involve the use of milk. However, upon determination he has no milk in the fridge, the user may wish to ask Gordon Ramsey whether whipped cream may be used as a substitute. While in the existing footage on YouTube, Gordon Ramsey may not have provided an answer to that question, the virtual personification may analyze Gordon Ramsey’s footage on substituting other items for milk to generate a virtual personification of Gordon Ramsey to answer this user question.
- the virtual personification of Gordon Ramsey may include a prediction of Gordon Ramsey’s typical reaction in such situations.
- the Al may determine, based on pre-recorded data, that Gordon Ramsey typically acts impatiently to such questions. Thus, the virtual personification of Gordon Ramsey may display a frown or curt gestures when providing the predicted answer.
- the virtual personification may be presented in a virtual reality space, which may be rendered using a virtual reality system.
- the virtual reality space may be a kitchen.
- topics such as carpentry the environment may be a wood working shop, car repair would appear in an auto garage, education may appear as a classroom, and information about a topic may actually appear inside the items, such as inside a virtual computer or a virtual engine to show how something works in combination with Al that creates answers for the user using the virtual reality space and the virtual personification.
- FIG. 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system.
- the virtual reality space is rendered by a virtual reality system.
- Exemplary virtual reality systems are described in U.S. Patent No 9,898,091, U.S. Patent Publication 2014/0364212, and U.S. Patent Publication 2015/0234189, which are incorporated by reference herein in their entirety as teaching exemplary virtual reality systems and methods.
- a user 100A may access the virtual reality space by the one or more components of a virtual reality system, such as a virtual reality device (“VR device”) 104A and external input devices 108A, which may be accessories to the VR device 104A.
- VR device virtual reality device
- the VR device 104A may be in direct communication with the external input devices 108A (such as by Bluetooth?) or via network 112A providing internet or signals (e.g., a personal area network, a local area network (“LAN”), a wireless LAN, a wide area network, etc.).
- the VR device 104A may also communicate with a remote Al 116A via the network 112A.
- the VR device 104A may be a wearable user device such as a virtual reality headset (“VR headset”), and the external input devices 108A may be hand-held controllers where a user may provide additional input such as arm motion, hand gestures, and various selection or control input through buttons or joysticks on such controllers.
- the VR device may generally include input devices 120 A through 128 A, input processing modules 132A, VR applications 134A, output rendering modules 138A, output devices 156A, 160A, and a communication module 164A.
- Input devices may include one or more audio input devices 120 A (such as microphones), one or more position tracking input devices 124A (to detect a user’s position and motion), and one or more facial tracking input devices 128 A (such as facial cameras to detect facial expressions, eye-tracking camera to detect gaze and eye movement, etc.). Additional external input devices may provide user biometrics data or tracking of other user body parts.
- audio input devices 120 A such as microphones
- position tracking input devices 124A to detect a user’s position and motion
- facial tracking input devices 128 A such as facial cameras to detect facial expressions, eye-tracking camera to detect gaze and eye movement, etc.
- Additional external input devices may provide user biometrics data or tracking of other user body parts.
- the input processing modules 132A may include, but are not limited to, an external input processing module 142A (used to process external inputs such as input from external devices 108A or additional external input devices discussed above), an audio input processing module 144A (used to process audio inputs, such as user speech or sounds), a position input processing module 146A (to process position and motion tracking inputs such as hand motions, finger motions, arm motions, head position), and a facial input processing module 148A (to process facial inputs of the user).
- an external input processing module 142A used to process external inputs such as input from external devices 108A or additional external input devices discussed above
- an audio input processing module 144A used to process audio inputs, such as user speech or sounds
- a position input processing module 146A to process position and motion tracking inputs such as hand motions, finger motions, arm motions, head position
- a facial input processing module 148A to process facial inputs of the user.
- the VR applications 134A are generally responsible for rendering virtual reality spaces associated with their respective VR applications 134A.
- a VR museum application may render a virtual museum through which a user may traverse and present various artwork which the user may view or interact with. This is achieved through the VR application’s 134A integration with output rendering modules 138A, which in turn presents the rendered files on output devices 156A, 160A.
- the output rendering modules 138A may include, but are not limited to, an audio output processing module 150A responsible for processing audio files, and an image and/or video output processing module 152A, responsible for processing image and/or video files.
- one or more audio output devices 156A such as built-in speakers on the VR headset may present the processed audio file
- one or more image and/or video output devices 160A may display the processed image and/or video files.
- Other types of output may include, but are not limited to, motion or temperature changes to the VR device 104A or the external input devices 108A (such as vibration on hand-held controllers).
- User interaction may in turn modify the virtual reality space. For example, if a user inputs motion to indicate he picked up a vase, the rendered virtual reality space may display a vase moving in accordance with the user’s motion.
- the transmission of information occurs in a bi-directional streaming fashion, from the user 100A to the VR device 104A and/or external input devices 108A, then from the VR device 104A and/or external input devices 108A back to the user 100A.
- U.S. Application 17/218,021 provides a more detailed discussion on bi-directional streaming using Al services and examples of broader and specific uses.
- the Al may be completely or partially built into the VR device 104 A or specific VR applications 134A. Such built-in Al components may be referred to a local Al 168 A. Other Al components may be located in the remote Al 116A, which may be operating on remote devices or on cloud-based servers. The local and remote Al 168 A, 116A may communicate via the network 112A. [0040] The Al may enhance the user s 100A interaction with the virtual reality system using the embodiments and methods described above.
- the Al may include one or more of the following components to generally operate the Al and process data, one or more processors 172 and one or more memory storage devices where logic modules 176 and machine learning modules 178 may be stored to provide general Al services.
- the memory storage devices may further include one or more modules to specifically enhance user-VR interaction, such as speech-to-text modules 180, non-verbal input processing modules 182, text augmentation modules 184, conversation management modules 186, response generation modules 188, audio rendering and updating modules 190, virtual personification rendering modules 192, virtual personification prediction modules 194, and integration modules 196.
- modules to specifically enhance user-VR interaction such as speech-to-text modules 180, non-verbal input processing modules 182, text augmentation modules 184, conversation management modules 186, response generation modules 188, audio rendering and updating modules 190, virtual personification rendering modules 192, virtual personification prediction modules 194, and integration modules 196.
- the speech-to-text modules 180 may be used to perform voice detection and customized speech to text recognition, as well as to generally detect, recognize, process, and interpret user audio input. Recognition allows the speech-to-text modules 180 to distinguish between verbal input (such as a user question) and non-verbal input (such as the user’s sigh of relief).
- a user may start an active conversation in the virtual reality space by simply speaking.
- the speech-to-text modules 180 may use voice activity detection in order to differentiate that the user has started speaking, as opposed to ambient noise activity.
- the speech-to-text modules 180 may process the input audio from the microphone to recognize the user’s spoken text. This processing can either happen as part of the viewing device (such as the VR device 104A), on a device connected to the viewing device, or on a remote server over the network (such as the remote Al 116A). This process may convert the stream of audio into the spoken language, such as text processable by a computer.
- the speech-to-text modules 180 may be customized to the current scene that the user is experiencing inside the virtual space, or a virtual personification that the user wishes to interact with. This customization could allow for custom vocabulary to be recognized when it would make sense in the specific environment or specific virtual personification. For example, if a user were interacting with a virtual personification of a cooking chef, then the speech recognition system may be customized to enhance name recognition for words associated food, whereas in a different environment a different vocabulary would be used. If the virtual personification of Gordon Ramsey were in a kitchen, then the speech recognition system may be customized to enhance name recognition for kitchen utensils.
- the Al’s speech-to-text modules 180 are intended to integrate and enhance existing features in the virtual reality system.
- the Al speech-to-text modules 180 may generate exponential amounts of interpretations from a single user input, automatically select the top interpretation based on user data, and hold multi-turn conversations with the user as a continuation of that single user input.
- Appendix A includes a more detailed discussion on systems and methods for enhanced speech-to- text. The enhanced speech-to-text and integration with other applications outside the virtual reality system, and the additional mechanism to recognize usable user input (as discussed in step 2) and to process out of scope user input.
- the Al s non-verbal input processing modules 182 may be used to process nonverbal input.
- audio input may be non-verbal (such as a user’s sigh of relief, or tone of voice).
- external input devices 108 A may include devices to track a user’ s biometrics or body parts other than arm, hand, and finger movement. Examples of devices to track a user’s biometrics include but are not limited to smartwatches, FitbitsTM, heart-rate monitors, blood pressure monitors, or any other devices which may be used to track a user’ s heart-rate, oxygen level, blood pressure, or any other metrics that may track a user’s body condition.
- Such input may all be processed using additional processing modules, which may be part of the virtual reality system (such as built into the VR device 104A), and/or may be part of the local or remote Al 168A, 116A.
- the text augmentation modules 184 may be used to add further context to the interpreted user 100A input.
- the speech-to-text modules 180 may supplement the spoken text with what the user is currently doing, or interacting with, to enhance its linguistic understanding of what the user has said. For example, this allows the Al to find co-references between what the user said and what they are looking at. Such as, if a user asks, “how old is this”, the term “this” can be implied from what the user is currently looking at, touching, near, or pointing at in the virtual world.
- This functionality can be carried about by fusions of any or one of the following inputs: the user's head position, eye detection, hand position - including placement, grip, pointing, controller position, and general orientation.
- the system may also fuse in non-controller related signals, such as biometrics from heart rate, breathing patterns, and any other bio sensory information. This information is fused over time to detect not just instantaneous values for fusion but trends as well.
- the text augmentation modules 184 may also be integrated with the non-verbal input processing modules 182 to receive further context. For example, in a multi-tum conversation where a user requests information, the user may input the word “okay”. Conventional system may, by default, cease communication because the response “okay” may be pre-coded as a command to terminate interaction.
- the text augmentation modules 184 may analyze the user’s tone to detect (1) boredom, and interpret “okay” as a request to shorten the information provided, (2) hesitation or confusion, and interpret “okay” as a request for additional information, (3) impatience, and interpret “okay” as a request to end the interaction.
- the text augmentation modules’ 184 integration with other devices and modules may not be linear. Rather, context from the virtual reality system may be used in one or more steps of speech interpretation. For example, in a multi-tum conversation (such as the conversation described above), at each turn of a user input, the speech-to-text modules may be used to generate the most accurate interpretation of the user’ s input, and the non-verbal input processing module 182 may be used to inject more context. Further, the Al’s conversation management modules 186 may be integrated with the text augmentation modules 184 to generate the output used in single or multi-turn conversations.
- the conversation management modules 186 may classify the spoken text into different categories to facilitate the open-ended conversation. First the conversation management modules 186 may determine if a statement is meant to initiate a new conversation or one that continues an existing conversation. If the user is detected to initiate a new conversation, then the conversation management modules 186 may classify the result among categories.
- a first category may include user comments that may not necessarily require a strong response. For example, if a user states “this is really cool”, the conversation management modules 186 may render the virtual personification to respond with a more descriptive or expressive response in relation to what was remarked as being cool. Alternatively, the virtual personification may not respond.
- a second category may include user questions that may be in relation to the current scene.
- a third category may be user questions that are in relation to the nonvirtualized world (i.e., reality).
- the conversation management modules 186 may facilitate an answer to the question via the virtual personification.
- the system may then proceed down to one of two or more paths.
- the conversation management modules 186 may first attempt to use information in pre-recorded data to answer the question. For example, during a user interaction with a virtual Gordon Ramsey on making a grilled salmon, a user may ask about the use of an ingredient not in the current recipe.
- the conversation management modules 186 may retrieve footage from another video where Gordon Ramsey uses that ingredient and may render the virtual Gordon Ramsey to modify the current recipe to include that ingredient.
- the conversation management modules 186 request the response generation modules 188 to analyze additional data (such as data on Gordon Ramsey’s presentation of a similar alternative ingredient or based on other chefs or known cooking information) to generate new behavior, speak, actions, response for the virtual Gordon Ramsey (such as the output of an opinion that the ingredient may not be desirable, or the rendering of Gordon Ramsey adding the ingredient to the recipe using Gordon Ramsey’s voice and behavior). If the user is in an existing conversation, then the conversation management modules 186 may proceed with the same approach as in the previous section, but with the added impetus of considering the context. Using context and past conversation details in the Al system provides a more realistic user interaction and avoid the virtual representation from repeating themselves or providing the same response.
- additional data such as data on Gordon Ramsey’s presentation of a similar alternative ingredient or based on other chefs or known cooking information
- the conversation management modules 186 may account for the likelihood that the user will continue to ask questions or follow ups to the previous response. The conversation management modules 186 may use this information to better carry out the next algorithm by utilizing this additional information.
- the audio rendering and updating modules 190 may be used, both independently and/or in conjunction with other modules 188, to output audio.
- the audio rendering and updating modules 190 may ensure, for example, that the proper response to a user question to virtual Gordon Ramsey may indeed be in Gordon Ramsey’s voice, tone, and speech pattern.
- the audio rendering and updating modules 190 may determine Gordon Ramsey frequently accompanies his speech with an exasperated sigh. As a result, the audio rendering and updating modules 190 may cause the virtual personification of Gordon Ramsey to output the same exasperated sigh in its user interactions. Additional analysis may be performed to customize speech of the virtual personification in the following areas: volume (such as a person who frequently speaks in raised voice), accent, tone (such as a person who likes to emphasize certain words), speech pattern, etc.
- the audio rendering and updating modules 190 may also supplement the user experience with the appropriate background noise or music, which may be updated based on user action. For example, as virtual Gordon Ramsey guides the user through the experience of baking a cake, at one-point virtual Gordon Ramsey may show the user how to turn on a cake mixer. In addition to the visual rendering of a cake mixer that is currently in use, the audio rendering and updating modules 190 may update the ambient noise to a background noise of an operating cake mixer. At a later point, then virtual Gordon Ramsey turns the cake mixer off, the audio rendering and updating modules 190 may update the background noise of an operating cake mixer back to ambient noise. This would be true for any and all sounds in a virtual environment.
- the virtual personification rendering module 192 may be used, in conjunction with other modules, to output the visual representation of the virtual personification.
- the virtual personification rendering module 192 may be used to generate the virtual Gordon Ramsey, which may include a visual representation of one or more of the following: Gordon Ramsey’s face, body, and clothing.
- the virtual personification rendering module 192 processes existing image or video of the person or item being personified and maps and creates new video from existing video and is also able to create new video or image from prior image and video to create a virtual personification that is doing or saying things which the original person never did or said.
- the virtual personification prediction modules 194 may be used, in conjunction with other modules, to predict behavior and responses by the virtual personification. Particularly in cases where no existing pre-recording data exists, and the Al must generate new responses/behavior, the virtual personification rendering module 192 may be used to render predictions on the virtual personifications’ responses, which may include one or more of the following: facial expressions, body movements, gestures, etc.
- the integration modules 196 may be used to integrate the one or more modules described above, so they may operate in sync.
- the response generation modules 188 may be integrated with the audio rendering and updating module to accurately output a virtual personification’s speech using the proper voice and tone, which may, in turn, be integrated with the virtual personification rendering modules 192 and the virtual personification prediction modules 194 to ensure the coherence between the output of voice, facial expression, and body movements.
- additional customization may be added to the virtual personification through additional modules.
- additional modules may be added to analyze a person’s temperament or personality and incorporate them into the virtual personification of that person.
- the Al presents two other main improvements on conventional systems - unlimited data and analysis, and dynamic and predictive rendering.
- a conventional rendering of a virtual Gordon Ramsey may be limited to existing video footage of Gordon Ramsey (such as, for example, a deep fake using Gordon Ramsey’s face, or the output of a pre-recording of Gordon Ramsey).
- a conventional software application used to render the virtual Gordon Ramsey may be limited to existing data in the local device.
- the Al may query any database accessible over the network to retrieve additional pre-recorded data, or generate rendering of virtual Gordon Ramsey in never-before-seen situations based on analysis of existing footage.
- additional information not currently existing on the internet may be predicted and generated.
- Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al 168B operating on a separate user device 190 such as a smartphone, a tablet, a personal computer, etc.
- a separate user device 190 such as a smartphone, a tablet, a personal computer, etc.
- Such separate user devices 190 may establish direct communication to the communication module 164 in the VR device 104B and/or a remote Al 116B housing additional Al modules or may communicate with the VR device 104B and remote Al 116B via the network 112B.
- the various Al components 172-196 illustrated in Figure 1A and discussed above may be stored in the local Al 168B and/or the remote Al 116B.
- the external input devices 108B and the various VR device components 120B-164B may interact with each other and with the local and remote AIs 168B, 116B in similar fashion as described in Figure
- any one or more of the Al modules 172-196 may be included in the local Al 168 and/or the remote Al 116.
- all Al modules 172-196 may be located on a local Al 168 operating in the VR device 104 such that no remote Al 116 may be necessary.
- all Al modules 172-196 may be located in a remote Al 116.
- most or all Al modules 172-196 are in a remote Al 116 such that the Al may be integrated with any VR device 104, including VR devices 104 with no built-in local Al 168 A. Such integration may be achieved using Al layers to power cross platform Al services, which is discussed in more details in U.S. Application 17/218,021.
- VR devices may be the preferred device to implement the virtual personification Al system
- the virtual personification Al system may be used on any computing devices capable of performing user interaction.
- the virtual personification Al system may be implemented on a device capable of performing AR display, such that the virtual personification may be output via AR technology.
- the virtual personification Al system may be implemented on smartphones, tablets, personal computers, laptop devices, etc., where the virtual personification may be a 2-dimensional output on a display screen.
- the virtual personification may be in audio-only mode, for implementation on a peripheral device (such as a vehicle with CarPlay) or wearable devices (such as smartwatches, smart rings, glasses, hearing aids, headphones, earbuds, etc.), home devices (such as home speakers, security cameras, webcams, televisions, projection screen monitors, sound bars, etc.), or any other electronic devices.
- a peripheral device such as a vehicle with CarPlay
- wearable devices such as smartwatches, smart rings, glasses, hearing aids, headphones, earbuds, etc.
- home devices such as home speakers, security cameras, webcams, televisions, projection screen monitors, sound bars, etc.
- FIG. 2 illustrates an exemplary environment of use of the virtual personification Al system.
- a user 200 may interact with the following devices, which may, as discussed above, be capable of implementing the virtual personification Al system: a VR device 204 (as illustrated in Figure 1), an AR device 208, or any other computing device 212 (such as a computer, a smart TV, or a smartphone).
- These devices 204-212 may then implement the Al modules 220 (which are separately illustrated as 172-196 in Figure 1A and discussed above).
- Such implementation may be, as discussed above, locally, remotely, or both.
- the Al modules 220 may be integrated with third-party tools such as virtual representation modules 216 and audio data modules 224.
- Virtual representation modules 216 may be any additional tools used to generate virtual personifications and virtual environments.
- Audio data modules 224 may be additional tools used to generate audio for virtual personifications and virtual environments.
- the Al modules 220 and its integrated third-party tools 216, 224 may be in direct communication, or communicate via a network 228 to access programs, servers, and/or databases stored in a cloud 232 and/or cloud-based servers, as well as other devices 236, which may in turn be connected to their respective databases 240.
- the Al modules 220 may access the third-party tools 216, 224 remotely, such as via the network 228.
- the Al modules 220 may thus access resources from all connected programs, devices, servers, and/or databases.
- FIG. 3 illustrates an example embodiment of a mobile device on which a solution generator may operate, also referred to as a user device which may or may not be mobile.
- a user device which may or may not be mobile.
- the mobile device 300 may comprise any type of mobile communication device capable of performing as described below.
- the mobile device may comprise a Personal Digital Assistant (“PDA”), cellular telephone, smart phone, tablet PC, wireless electronic pad, an loT device, a “wearable” electronic device or any other computing device.
- PDA Personal Digital Assistant
- the mobile device 300 is configured with an outer housing 304 designed to protect and contain the components described below.
- a processor 308 communicates over the buses 312 with the other components of the mobile device 300.
- the processor 308 may comprise any type processor or controller capable of performing as described herein.
- the processor 308 may comprise a general purpose processor, ASIC, ARM, DSP, controller, or any other type processing device.
- the processor 308 and other elements of the mobile device 300 receive power from a battery 320 or other power source.
- An electrical interface 324 provides one or more electrical ports to electrically interface with the mobile device, such as with a second electronic device, computer, a medical device, or a power supply/charging device.
- the interface 324 may comprise any type electrical interface or connector format.
- One or more memories 310 are part of the mobile device 300 for storage of machine readable code for execution on the processor 308 and for storage of data, such as image, audio, user, location, accelerometer, or any other type of data.
- the memory 310 may comprise RAM, ROM, flash memory, optical memory, or micro-drive memory.
- the machine readable code (software modules and/or routines) as described herein is non-transitory.
- the processor 308 connects to a user interface 316.
- the user interface 316 may comprise any system or device configured to accept user input to control the mobile device.
- the user interface 316 may comprise one or more of the following: microphone, keyboard, roller ball, buttons, wheels, pointer key, touch pad, and touch screen.
- a touch screen controller 330 which interfaces through the bus 312 and connects to a display 328.
- the display comprises any type display screen configured to display visual information to the user.
- the screen may comprise a LED, LCD, thin film transistor screen, OEL CSTN (color super twisted nematic), TFT (thin film transistor), TFD (thin film diode), OEED (organic light-emitting diode), AMOEED display (active-matrix organic light-emitting diode), capacitive touch screen, resistive touch screen or any combination of such technologies.
- the display 328 receives signals from the processor 308, and these signals are translated by the display into text and images as is understood in the art.
- the display 328 may further comprise a display processor (not shown) or controller that interfaces with the processor 308.
- the touch screen controller 330 may comprise a module configured to receive signals from a touch screen which is overlaid on the display 328.
- a speaker 334 and microphone 338 are also part of this exemplary mobile device.
- the speaker 334 and microphone 338 may be controlled by the processor 308.
- the microphone 338 is configured to receive and convert audio signals to electrical signals based on processor 308 control.
- the processor 308 may activate the speaker 334 to generate audio signals.
- first wireless transceiver 340 and a second wireless transceiver 344 are connected to respective antennas 348, 352.
- the first and second transceivers 340, 344 are configured to receive incoming signals from a remote transmitter and perform analog front-end processing on the signals to generate analog baseband signals. The incoming signal may be further processed by conversion to a digital format, such as by an analog to digital converter, for subsequent processing by the processor 308.
- the first and second transceivers 340, 344 are configured to receive outgoing signals from the processor 308, or another component of the mobile device 308, and up convert these signals from baseband to RF frequency for transmission over the respective antenna 348, 352.
- the mobile device 300 may have only one or two such systems, or more transceivers.
- some devices are tri-band or quad-band capable, or have Bluetooth?, NFC, or other communication capability.
- the mobile device 300 may be configured to operate according to any presently existing or future developed wireless standard including, but not limited to, Bluetooth, WI-FI such as IEEE 802.11 a,b,g,n, wireless LAN, WMAN, broadband fixed access, WiMAX, any cellular technology including CDMA, GSM, EDGE, 3G, 4G, 5G, TDMA, AMPS, FRS, GMRS, citizen band radio, VHF, AM, FM, and wireless USB.
- WI-FI such as IEEE 802.11 a,b,g,n, wireless LAN, WMAN, broadband fixed access, WiMAX, any cellular technology including CDMA, GSM, EDGE, 3G, 4G, 5G, TDMA, AMPS, FRS, GMRS, citizen band radio, VHF, AM, FM, and wireless USB.
- GPS global positioning system
- a gyroscope 364 connects to the bus 312B to generate and provide orientation data regarding the orientation of the mobile device 300.
- a magnetometer 368 is provided to supply directional information to the mobile device 300.
- An accelerometer 372 connects to the bus 312B to provide information or data regarding shocks or forces experienced by the mobile device. In one configuration, the accelerometer 372 and gyroscope 364 generate and provide data to the processor 308 to indicate a movement path and orientation of the mobile device 300.
- One or more cameras (still, video, or both) 376 are provided to capture image data for storage in the memory 310 and/or for possible transmission over a wireless or wired link, or for viewing at a later time.
- the one or more cameras 376 may be configured to detect an image using visible light and/or near-infrared light.
- the cameras 376 may also be configured to utilize image intensification, active illumination, or thermal vision to obtain images in dark environments.
- the processor 308 may process machine-readable code that is stored on the memory to perform the functions described herein.
- a flasher and/or flashlight 380 such as an LED light, are provided and are processor controllable.
- the flasher or flashlight 380 may serve as a strobe or traditional flashlight.
- the flasher or flashlight 380 may also be configured to emit near- infrared light.
- a power management module 384 interfaces with or monitors the battery 320 to manage power consumption, control battery charging, and provide supply voltages to the various devices which may require different power requirements.
- FIG. 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment.
- Computing device 400 is intended to represent various forms of digital computers, such as smartphones, tablets, kiosks, laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
- Computing device 400 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, and other similar computing devices.
- the components shown, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit the implementations described and/or claimed in this document.
- Computing device 400 includes a processor 402, memory 404, a storage device 406, a high-speed interface or controller 408 connecting to memory 404 and high-speed expansion ports 410, and a low-speed interface or controller 412 connecting to low- speed bus 414 and storage device 406.
- Each of the components 402, 404, 406, 408, 410, and 412, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
- the processor 402 can process instructions for execution within the computing device 400, including instructions stored in the memory 404 or on the storage device 406, to display graphical information for a GUI on an external input/output device, such as display 416 coupled to high-speed controller 408.
- multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
- multiple computing devices 400 may be connected, with each device providing portions of the necessary operations (e.g., a server bank, a group of blade servers, or a multi-processor system).
- the memory 404 stores information within the computing device 400.
- the memory 404 is a volatile memory unit or units.
- the memory 404 is a non-volatile memory unit or units.
- the memory 404 may also be another form of computer-readable medium, such as a magnetic or optical disk.
- the storage device 406 is capable of providing mass storage for the computing device 400.
- the storage device 406 may be or contain a computer-readable medium, such as a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid-state memory device, or an array of devices, including devices in a storage area network or other configurations.
- a computer program product can be tangibly embodied in an information carrier.
- the computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above.
- the information carrier is a computer- or machine-readable medium, such as the memory 404, the storage device 406, or memory on processor 402.
- the high-speed controller 408 manages bandwidth- intensive operations for the computing device 400, while the low-speed controller 412 manages lower bandwidthintensive operations. Such allocation of functions is exemplary only.
- the high-speed controller 408 is coupled to memory 404, display 416 (i.e., through a graphics processor or accelerator), and to high-speed expansion ports 410, which may accept various expansion cards (not shown).
- low-speed controller 412 is coupled to storage device 406 and low- speed bus 414.
- the low-speed bus 414 which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router (i.e., through a network adapter).
- input/output devices such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router (i.e., through a network adapter).
- the computing device 400 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 420, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 424. In addition, it may be implemented in a personal computer such as a laptop computer 422. Alternatively, components from computing device 400 may be combined with other components in a mobile device (not shown), such as device 450. Each of such devices may contain one or more computing devices 400, 450, and an entire system may be made up of multiple computing devices 400, 450 communicating with each other.
- Computing device 450 includes a processor 452, memory 464, an input/output device such as a display 454, a communication interface 466, and a transceiver 468, among other components.
- the computing device 450 may also be provided with a storage device, such as a micro-drive or other device(s), to provide additional storage.
- a storage device such as a micro-drive or other device(s)
- Each of the components 452, 464, 454, 466, and 468 are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
- the processor 452 can execute instructions within the computing device 450, including instructions stored in the memory 464.
- the processor 452 may be implemented as a chipset of chips that include separate and multiple analog and digital processors.
- the processor may provide, for example, for coordination of the other components of the computing device 450, such as control of user interfaces, applications run by the computing device 450, and wireless communication by the computing device 450.
- Processor 452 may communicate with a user through control interface 458 and display interface 456 coupled to a display 454.
- the display 454 may be a TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology.
- the display interface 456 may comprise appropriate circuitry for driving the display 454 to present graphical and other information to a user.
- the control interface 458 may receive commands from a user and convert them for submission to the processor 452.
- an external interface 462 may be provided in communication with processor 452, to enable near area communication of computing device 450 with other devices .
- external interface 462 may provide for wired communication, or in other implementations, for wireless communication, whilst multiple interfaces may also be used.
- the memory 464 stores information within the computing device 450.
- the memory 464 can be implemented as one or more of a computer-readable medium or media, a volatile or a non-volatile memory unit or units.
- Expansion memory 474 may also be provided and connected to the computing device 450 through expansion interface 472, which may include, for example, a SIMM (Single In Line Memory Module) card interface.
- SIMM Single In Line Memory Module
- expansion memory 474 may provide extra storage space and/or may also store applications or other information for the computing device 450.
- expansion memory 474 may include instructions to carry out or supplement the processes described above and may also include secure information.
- expansion memory 474 may be provided as a security module for computing device 450 and may be programmed with instructions that permit secure use of the same.
- the memory may include for example, flash memory and/or NVRAM memory, as discussed below.
- a computer program product is tangibly embodied in an information carrier.
- the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
- the information carrier is a computer- or machine-readable medium, such as the memory 464, expansion memory 474, or memory on processor 452, that may be received for example, over transceiver 468 or external interface 462.
- the computing device 450 may communicate wirelessly through communication interface 466, which may include digital signal processing circuitry where necessary. Communication interface 466 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur for example, through a radio-frequency transceiver 468. In addition, short-range communication may occur, such as using a Bluetooth, Wi-Fi, or other such transceiver (not shown). In addition, GPS (Global Positioning system) receiver module 470 may provide additional navigation- and location-related wireless data to the computing device 450, which may be used as appropriate by applications running on the computing device 450.
- GPS Global Positioning system
- the computing device 450 may also communicate audibly using audio codec 460, which may receive spoken information from a user and convert it to usable digital information. Audio codec 460 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the computing device 450. Such sound may include audio from voice telephone calls, recorded audio (e.g., voice messages, music files, etc.), and may also further include audio generated by applications operating on the computing device 450.
- Audio codec 460 may receive spoken information from a user and convert it to usable digital information. Audio codec 460 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the computing device 450. Such sound may include audio from voice telephone calls, recorded audio (e.g., voice messages, music files, etc.), and may also further include audio generated by applications operating on the computing device 450.
- the computing device 450 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 460. It may also be implemented as part of a smartphone 482, personal digital assistant, a computer tablet, or other similar mobile device.
- various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, especially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- These various implementations can include applications in one or more computer programs that are executable and/or interpretable on a programmable system, including at least one programmable processor which may be special or of general purpose, coupled to receive data and instructions, to and from a storage system, at least one input device, and at least one output device.
- the systems and techniques described here can be implemented on a computer having a display device LCD (liquid crystal display) monitor, LED, or any other flat panel display, for displaying information to the user, a keyboard, and a pointing device (e.g., mouse, joystick, trackball, or similar device) by which the user can provide input to the computer.
- LCD liquid crystal display
- a keyboard e.g., a keyboard
- a pointing device e.g., mouse, joystick, trackball, or similar device
- Other kinds of devices can be used to provide for interaction with a user as well, for example; feedback provided to the user can be any form of sensory feedback (e.g., visual, auditory, or tactile); and input from the user can be received in any form, including acoustic, speech, or tactile input.
- the systems and techniques described here may be implemented in a computing system (e.g., computing device 400 and/or 450) that includes a back end component, or that includes a middleware component (e.g., application server), or that includes a frontend component such as a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the systems and techniques described herein, or any combination of such back-end, middleware, or front-end components.
- the components of the system may be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
- LAN local area network
- WAN wide area network
- the Internet the global information network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- the virtual personification Al system may be used to implement many possible applications.
- a conventional weather application may display weather searchable by pre-determined parameters (such as location and date).
- the Al system may output a virtual personification of a popular on-camera meteorologist (such as Jim Cantore) to not only provide weather based on pre-determined parameters, but also to further interact with the user.
- a virtual personification of a popular on-camera meteorologist such as Jim Cantore
- a user may first request the current weather and a forecast for Hawaii, and then ask, “what clothes should I pack for my upcoming vacation”?
- a conventional weather application will not understand the user question because it may not remember the context (weather in Hawaii).
- the virtual personification Al system may not only understand the context of the user question and accurately determine the user’s true request - a retrieval of the user’s calendar to determine the accurate date range and/or location for the upcoming vacation, a projection of the weather during that date range, an analysis of proper attire given the weather, personal data (such as user preferences on clothing items), and location (which may take into account additional factors such as humidity, altitude, and local culture). Further, the Al may be capable of presenting the proper response to the user request using the virtual personification of Jim Cantore in a conversational format, even though that person has not previously answered that question or provided that particular response in the past.
- pre-recorded data may be analyzed, and virtual personifications may be generated, for any person(s), not just famous ones.
- a user may submit family videos of a great-grandfather who has passed away, and the virtual personification system may render a virtual personification of the greatgrandfather, who may interact with future generations. This concept may be applied to any person living or passed away.
- the system may create virtual personifications which have an appearance different to anyone alive or previously alive.
- the Al may supplement missing footage with its own.
- new rendering may be generated using Jillian Michaels’ footage performing other, but similar actions, or other people performing the requested exercise, but rendered to appear as Jillian Michaels.
- the new rendering may be generated using deepfake or other technology to combine Jillian Michaels’ footage with generic footage of another person performing a standing core exercise.
- the Al may provide a set of default footage (such as a default virtual model performing standing core exercises) and superimpose whatever footage of Jillian Michaels’ may be available on the default model. It is contemplated that any other type of technology may be used to generate new rendering where no pre-recorded data exist.
- Another possible expansion of the virtual personification Al system may be to generate and render entirely imaginary characters. For example, by combining a new character design for a two-headed animal with default models or existing footage of a dinosaur, a virtual personification of a new two-headed dinosaur may be generated, and its behavior may be based on analysis of a wolf, or it may interact with the user with the voice of an old man.
- virtual personification may be customized based on both pre-recorded data and user preferences. It is contemplated that the personification is not limited to people but other things (real or not real), such as but not limited to animals, robots, cars, birds, fish, extinct species, alien creatures, created items or beings, or any other item.
- the virtual personification Al system s ability to generate any type of virtual personification and its versatility of customization enables broad application to all types of technology and environment.
- One exemplary use may be education, where classes and/or tutorials may be provided to students with virtual personification of an instructor on any subject, which may automate some or all of a student’s classroom experience without compromising the student’s personal interaction (such as to ask questions).
- the virtual instructor may draw information from existing knowledge or databases, thereby providing answers a live instructor may not have.
- the class may be taught by a virtual representation of someone famous, such as Albert Einstein, Sandra Day O’Connor, or Alexander Graham Bell.
- Another exemplary use may be training, which may include standardized training for professional purposes (such as customized professional training of airplane pilots using a virtual personification of a flight instructor and the cockpit), or training for hobbies or information learning (such as cooking with a virtual personification of Gordon Ramsey).
- the virtual environment may react to different actions by the user, such as use of certain controls, to provide realistic training.
- the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training.
- a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an ambulance to arrive.
- the caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”? Answers may be pulled from a database and the personification may actually perform the answer to the question so the user can see how to perform the medical training.
- This concept can be applied to teaching tools for medical procedures such that the virtual representations of the best doctors in the world can be created to show how to perform a procedure and dynamically respond to any question for personalized teaching.
- Yet another exemplary use may be entertainment, such as allowing users to interact with famous people from the past or an imaginary character in a conversational setting. This allows people to have personal interactions with people from history and interact using the Al databases such that the virtual representation can answer any question or perform any action in real time during user interaction. These famous people from which the virtual representation is created could be any person, famous for any reason.
- a database may be created for each person so that the Al system can accurately create visual, audio, and knowledge representations.
- the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training.
- a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an ambulance to arrive. The caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”?
- Yet another example may be a simulated room where two virtual personifications may interact with each other instead of (or in addition to) interaction with users.
- a user may wish to simulate a philosophical debate between Socrates and Kant.
- the virtual room may be expanded to include entire virtual worlds and large populations of virtual characters. The user can learn from seeing how two or more virtual representation interact, such as in professional environments, military situation, formal engagements, or social interaction.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A method and apparatus to generate and update virtual personification using artificial intelligence comprising a system configured to perform the following. Receive data associated with a person such as text files, audio files, image files, and video files. Render a virtual personification of the person and output the virtual personification to a user, such as on a display screen. Then, receiving and interpreting a user input to generate a user request, and then updating the virtual personification. The update may include generating an audio output using the text files and the audio files of the person and/or generating a video output using the image files and the video files of the person. The audio output and the video output is presented to the user by the virtual personification and it has not previously occurred by the person or thing represented by the virtual personification.
Description
METHOD AND SYSTEM FOR VIRTUAL INTELLIGENCE USER
INTERACTION
1. Field of the Invention
[0001] The present invention is directed to a method and system to provide user interaction with virtual personifications using artificial intelligence (“Al”).
2. Description of the Related Art
[0002] Advancement with VR and AR technology now allows for users to view real or simulated environments, (referred to as virtual environments) using a screen equipped headset or a traditional screen. Within these virtual environments, user have been able to view elements and move about to further explore the world. However, user interaction with current technology on virtual avatar is typically based on pre-recorded, pre-scripted, image or audio files. In other words, the user can look about the environment and travel from place to place within the environment but beyond that, interaction with the virtual environment is limited.
[0003] Other systems allow for some interaction with the virtual environment to obtain information about an items in the environment, such as to click on an items to obtain additional information. However, the interaction with elements in the virtual environment is limited to pre-created or pre-recorded information that typically is no more than a short pre-recorded message or text that is typically non-responsive and
often no better than a frustrating voice script. These systems lack individualization to the particular user’s interest and specific questions and are sterile in that prior art systems are no better than simply reading an article or watching a video on a web site.
SUMMARY
[0004] To overcome the drawbacks of the prior art and provide additional benefits, disclosed is a system and method to generate and update virtual personification using artificial intelligence receiving data associated with a person, the data comprising one or more of the following: text files, audio files, image files, and video files, and rendering a virtual personification of the person and outputting the virtual personification to a user. Then, receiving and interpreting a user input to generate a user request and updating the virtual personification in response to the user request. The update comprising one or more of the following. Responsive to the user request, generating an audio output using the text and audio files of the person and responsive to the user request, generating a video output using the image files and the video files of the person, such that the audio output and the video output is presented to the user by the virtual personification. Furthermore, the audio output and the video output presented by the virtual personification has not previously occurred by the person or thing represented by the virtual personification.
[0005] In one embodiment, the virtual personification is of a person, either living or deceased. It is contemplated that the virtual personification may comprise an audio output and video output which are presented in a virtual environment of a type associated with the virtual personification. The virtual personification may comprise a representation of a non-living item.
[0006] In one embodiment, the method is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module. The remote
artificial intelligence module may be a computing device with a processor and memory storing machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification. Upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user. It is also contemplated that the method may further comprise tracking a user’s hand position using one or more user hand position tracking devices to determine what the user is pointing at in the virtual environment. The step of generating a response to the question or request may use artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which to not provide a direct answer to the question or request.
[0007] Also disclosed is a system for presenting an interactive, artificial intelligence assisted, virtual personification to a user comprising a virtual reality device configured to have at least a portion be worn by the user. The virtual reality device includes a wearable screen configured for viewing by a user, one or more speakers configured to provide audio output to the user, a microphone configured to receive audio input from the user, and one or more external input devices configured to receive input from the user. Also part of the virtual reality device includes a communication module configured to communicate over a computer network or Internet, and a processor with access to a memory. The processor executes machine readable code and the memory
is configured to store the machine readable code. The machine readable code is configured to present a virtual environment on the wearable screen and through the one or more speakers to the user and present, to the user on the wearable screen and through the one or more speakers, a virtual personification of a person currently living or deceased, in the virtual environment. The code is also configured to receive a question or request from the user regarding one or more aspects of the virtual environment or the virtual personification and then generate a response to the question or request from the user, which includes generating video content and audio content which did not previously exist. The code then presents the generated response to the user on the wearable screen and through the one or more speakers in response to question or request from the user.
[0008] In one embodiment, the machine readable code is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module. It is further contemplated that the remote artificial intelligence module may be a computing device with memory and processor such that memory store machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification. Then, upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request, and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
[0009] The system may further comprise one or more user hand position tracking device configured to track a position of a user’s hand to determine what the user is pointing at in the virtual environment. In one embodiment, the input from the user comprises an audio input or an input to the one or more external input devices. It is contemplated that generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both, of the person represented by the virtual personification, to form the video content and audio content which did not previously exist. In addition, the generated response to the question or request uses artificial intelligence to generate an answer by searching one or more databases that contain information from a person represented by the virtual personification but which to not provide a direct answer to the question or request.
[0010] Also disclosed herein is a method for presenting an interactive experience with a virtual personification using a screen, speakers, and microphone of a user computing device. In one embodiment the method comprises a virtual environment on the wearable screen and through the one or more speakers to the user and present the virtual personification in the virtual environment. Then, receiving input from the user comprising a question, a user request, or subject regarding one or more aspects of the virtual environment, the virtual personification, or the actions of the virtual personification in the virtual environment. This method then sends a request for a response to the input from the user to an Al computing device that is remote from the user computing device, and with the Al computing device, create a response based on pre-existing content stored in one or more databases which is processed to create the generated response. Then, transmitting the generated response to the user computing device and, at the user computing device, based on the generated response from the Al
computing device, generating video content and audio content which did not previously exist. Finally, the method of operation presents video content and audio content which did not previously exist to the user.
[0011] In one embodiment, the Al computing device is a computing device with memory and processor such that the memory stores machine readable code configured to receive the input from the user computing device, process the input from the user to derive a meaning, and based on the meaning, perform one or more searches for answers to the input from the user in databases unrelated to the virtual personification. Upon locating an response to the input from the user, generate data that represents the virtual personification answering the question or request, and transmit the data, that represents the virtual personification responding to the input from the user, to the user computing device.
[0012] This method may further include monitoring one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment and interpreting the pointing as the input from the user. It is contemplated that the input from the user comprises an audio input or an input from the user to the one or more external input devices. The step of generating video content and audio content which did not previously exist occurs by processing existing video, audio, or both of a person represented by the virtual personification to generate new content.
[0013] Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods,
features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
DESCRIPTION OF THE FIGURES
[0014] The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. In the figures, like reference numerals designate corresponding parts throughout the different views.
[0015] Figure 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system.
[0016] Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al operating on a separate user device such as a smartphone, a tablet, a personal computer, etc.
[0017] Figure 2 illustrates an exemplary environment of use of the virtual personification Al system.
[0018] Figure 3 illustrates a block diagram of an example embodiment of a computing device, also referred to as a user device which may or may not be mobile.
[0019] Figure 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment.
DETAILED DESCRIPTION OF THE INVENTION
GLOSSARY OF TERMS
[0020] Al services: Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
[0021] Device: A device is any element running with a minimum of a CPU or a system which is used to interface with a device. Optionally, an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
[0022] Application: An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
[0023] The following terms are used in this document and the following definitions are provided to aid in understanding but should be interpreted as being limiting in scope
[0024] Al services: Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
[0025] Device: A device is any element running with a minimum of a CPU or a system which is used to interface with a device. Optionally, an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
[0026] Application: An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
[0027] In this disclosure, a virtual personification system may analyze pre-recorded data to generate dynamic responses to user requests/questions through virtual personifications. In one embodiment, the virtual personification may be a virtual representation which may be based on a real person. For example, the user, a family member or relative, a famous person, a historical figure, or any other type person. The
virtual representation may also be a user or computer created person that does not represent a real person. Pre-recorded data may include image, video, or audio footage of the real person (such as YouTube and other film footage). Dynamic responses are generated to user requests/questions related to that known person, even though prerecorded data may not include any adequate responses or responses which will match the question.
[0028] For example, a user may wish to be provided with a recipe from a famous chef, such as Gordon Ramsey, to make grilled salmon. Upon determination that pre-recorded data exists on Gordon Ramsey making another type of grilled chicken, the virtual personification may analyze Gordon Ramsey’s footage on making grilled chicken and grilled potatoes to generate a virtual personification of Gordon Ramsey guiding the user through the process of making grilled salmon, as if Gordon Ramsey were in a cooking show and personally providing detailed instructions to the specific user request. The system Al can pull details from prior recordings and manipulate the visual and audio files to create a new virtual representation that is directly and accurately responsive to the user’s request. Al may generate new information, such as how to adjust the response to be responsive to the specific user request. In the example of the cooking question, Al can understand the user’s request, analyze the information already provided by the chef about how to cook chicken, realize that chicken is not salmon, and then search for a recipe for salmon by the same chef or recipe, and then process the new recipe and the virtual representation to present the new recipe to the user of the system using the virtual representation, as if the original chef was actually providing the recipe for salmon and not chicken. Although applied to food, this example may be applied to any other topic or environment of use.
[0029] The virtual personification of Gordon Ramsey may use a voice that sounds like Gordon Ramsey, may be dressed like Gordon Ramsey, as he typically appears on cooking shows, and may mimic Gordon Ramsey’s body language and speech pattern. Al may be used to create the virtual personification even in situations when the actual person never actually provided a responsive answer in a video or audio recording. The virtual representation may be created using built-in Al modules such as a virtual personification rendering module (discussed in more details below) or using third-party tools, which the virtual personification system may interface with.
[0030] In another example, the user may attempt Gordon Ramsey’s recipe of scrambled eggs, which may already be available on YouTube, and which may involve the use of milk. However, upon determination he has no milk in the fridge, the user may wish to ask Gordon Ramsey whether whipped cream may be used as a substitute. While in the existing footage on YouTube, Gordon Ramsey may not have provided an answer to that question, the virtual personification may analyze Gordon Ramsey’s footage on substituting other items for milk to generate a virtual personification of Gordon Ramsey to answer this user question. The virtual personification of Gordon Ramsey may include a prediction of Gordon Ramsey’s typical reaction in such situations. For example, the Al may determine, based on pre-recorded data, that Gordon Ramsey typically acts impatiently to such questions. Thus, the virtual personification of Gordon Ramsey may display a frown or curt gestures when providing the predicted answer.
[0031] In one embodiment, the virtual personification may be presented in a virtual reality space, which may be rendered using a virtual reality system. For example, in a cooking environment, the virtual reality space may be a kitchen. For other topics, such as carpentry the environment may be a wood working shop, car repair would appear in
an auto garage, education may appear as a classroom, and information about a topic may actually appear inside the items, such as inside a virtual computer or a virtual engine to show how something works in combination with Al that creates answers for the user using the virtual reality space and the virtual personification.
[0032] Figure 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system. The virtual reality space is rendered by a virtual reality system. Exemplary virtual reality systems are described in U.S. Patent No 9,898,091, U.S. Patent Publication 2014/0364212, and U.S. Patent Publication 2015/0234189, which are incorporated by reference herein in their entirety as teaching exemplary virtual reality systems and methods. A user 100A may access the virtual reality space by the one or more components of a virtual reality system, such as a virtual reality device (“VR device”) 104A and external input devices 108A, which may be accessories to the VR device 104A. The VR device 104A may be in direct communication with the external input devices 108A (such as by Bluetooth?) or via network 112A providing internet or signals (e.g., a personal area network, a local area network (“LAN”), a wireless LAN, a wide area network, etc.). The VR device 104A may also communicate with a remote Al 116A via the network 112A.
[0033] In a preferred embodiment, the VR device 104A may be a wearable user device such as a virtual reality headset (“VR headset”), and the external input devices 108A may be hand-held controllers where a user may provide additional input such as arm motion, hand gestures, and various selection or control input through buttons or joysticks on such controllers.
[0034] The VR device may generally include input devices 120 A through 128 A, input processing modules 132A, VR applications 134A, output rendering modules 138A, output devices 156A, 160A, and a communication module 164A. Input devices may include one or more audio input devices 120 A (such as microphones), one or more position tracking input devices 124A (to detect a user’s position and motion), and one or more facial tracking input devices 128 A (such as facial cameras to detect facial expressions, eye-tracking camera to detect gaze and eye movement, etc.). Additional external input devices may provide user biometrics data or tracking of other user body parts.
[0035] The input processing modules 132A may include, but are not limited to, an external input processing module 142A (used to process external inputs such as input from external devices 108A or additional external input devices discussed above), an audio input processing module 144A (used to process audio inputs, such as user speech or sounds), a position input processing module 146A (to process position and motion tracking inputs such as hand motions, finger motions, arm motions, head position), and a facial input processing module 148A (to process facial inputs of the user).
[0036] The VR applications 134A are generally responsible for rendering virtual reality spaces associated with their respective VR applications 134A. For example, a VR museum application may render a virtual museum through which a user may traverse and present various artwork which the user may view or interact with. This is achieved through the VR application’s 134A integration with output rendering modules 138A, which in turn presents the rendered files on output devices 156A, 160A.
[0037] Specifically, the output rendering modules 138A may include, but are not limited to, an audio output processing module 150A responsible for processing audio files, and an image and/or video output processing module 152A, responsible for processing image and/or video files. In turn, one or more audio output devices 156A, such as built-in speakers on the VR headset may present the processed audio file, and one or more image and/or video output devices 160A (such as a built-in screen on the VR headset) may display the processed image and/or video files. Other types of output may include, but are not limited to, motion or temperature changes to the VR device 104A or the external input devices 108A (such as vibration on hand-held controllers).
[0038] User interaction may in turn modify the virtual reality space. For example, if a user inputs motion to indicate he picked up a vase, the rendered virtual reality space may display a vase moving in accordance with the user’s motion. Thus, the transmission of information occurs in a bi-directional streaming fashion, from the user 100A to the VR device 104A and/or external input devices 108A, then from the VR device 104A and/or external input devices 108A back to the user 100A. U.S. Application 17/218,021 provides a more detailed discussion on bi-directional streaming using Al services and examples of broader and specific uses.
[0039] The Al may be completely or partially built into the VR device 104 A or specific VR applications 134A. Such built-in Al components may be referred to a local Al 168 A. Other Al components may be located in the remote Al 116A, which may be operating on remote devices or on cloud-based servers. The local and remote Al 168 A, 116A may communicate via the network 112A.
[0040] The Al may enhance the user s 100A interaction with the virtual reality system using the embodiments and methods described above. The Al may include one or more of the following components to generally operate the Al and process data, one or more processors 172 and one or more memory storage devices where logic modules 176 and machine learning modules 178 may be stored to provide general Al services. The memory storage devices may further include one or more modules to specifically enhance user-VR interaction, such as speech-to-text modules 180, non-verbal input processing modules 182, text augmentation modules 184, conversation management modules 186, response generation modules 188, audio rendering and updating modules 190, virtual personification rendering modules 192, virtual personification prediction modules 194, and integration modules 196.
[0041] The speech-to-text modules 180 may be used to perform voice detection and customized speech to text recognition, as well as to generally detect, recognize, process, and interpret user audio input. Recognition allows the speech-to-text modules 180 to distinguish between verbal input (such as a user question) and non-verbal input (such as the user’s sigh of relief).
[0042] A user may start an active conversation in the virtual reality space by simply speaking. The speech-to-text modules 180 may use voice activity detection in order to differentiate that the user has started speaking, as opposed to ambient noise activity. When true speech is detected, the speech-to-text modules 180 may process the input audio from the microphone to recognize the user’s spoken text. This processing can either happen as part of the viewing device (such as the VR device 104A), on a device connected to the viewing device, or on a remote server over the network (such as the
remote Al 116A). This process may convert the stream of audio into the spoken language, such as text processable by a computer.
[0043] The speech-to-text modules 180 may be customized to the current scene that the user is experiencing inside the virtual space, or a virtual personification that the user wishes to interact with. This customization could allow for custom vocabulary to be recognized when it would make sense in the specific environment or specific virtual personification. For example, if a user were interacting with a virtual personification of a cooking chef, then the speech recognition system may be customized to enhance name recognition for words associated food, whereas in a different environment a different vocabulary would be used. If the virtual personification of Gordon Ramsey were in a kitchen, then the speech recognition system may be customized to enhance name recognition for kitchen utensils.
[0044] While the virtual reality system may have its own modules to process audio inputs, the Al’s speech-to-text modules 180 are intended to integrate and enhance existing features in the virtual reality system. For example, the Al speech-to-text modules 180 may generate exponential amounts of interpretations from a single user input, automatically select the top interpretation based on user data, and hold multi-turn conversations with the user as a continuation of that single user input. Appendix A includes a more detailed discussion on systems and methods for enhanced speech-to- text. The enhanced speech-to-text and integration with other applications outside the virtual reality system, and the additional mechanism to recognize usable user input (as discussed in step 2) and to process out of scope user input.
[0045] The Al s non-verbal input processing modules 182 may be used to process nonverbal input. As discussed above, audio input may be non-verbal (such as a user’s sigh of relief, or tone of voice). As well, external input devices 108 A may include devices to track a user’ s biometrics or body parts other than arm, hand, and finger movement. Examples of devices to track a user’s biometrics include but are not limited to smartwatches, Fitbits?, heart-rate monitors, blood pressure monitors, or any other devices which may be used to track a user’ s heart-rate, oxygen level, blood pressure, or any other metrics that may track a user’s body condition. Such input may all be processed using additional processing modules, which may be part of the virtual reality system (such as built into the VR device 104A), and/or may be part of the local or remote Al 168A, 116A.
[0046] The text augmentation modules 184 may be used to add further context to the interpreted user 100A input. When the speech-to-text modules 180 has successfully transcribed the user’ s spoken text, the text augmentation modules 184 may supplement the spoken text with what the user is currently doing, or interacting with, to enhance its linguistic understanding of what the user has said. For example, this allows the Al to find co-references between what the user said and what they are looking at. Such as, if a user asks, “how old is this”, the term “this” can be implied from what the user is currently looking at, touching, near, or pointing at in the virtual world. This functionality can be carried about by fusions of any or one of the following inputs: the user's head position, eye detection, hand position - including placement, grip, pointing, controller position, and general orientation. Furthermore, the system may also fuse in non-controller related signals, such as biometrics from heart rate, breathing patterns,
and any other bio sensory information. This information is fused over time to detect not just instantaneous values for fusion but trends as well.
[0047] The text augmentation modules 184 may also be integrated with the non-verbal input processing modules 182 to receive further context. For example, in a multi-tum conversation where a user requests information, the user may input the word “okay”. Conventional system may, by default, cease communication because the response “okay” may be pre-coded as a command to terminate interaction. The text augmentation modules 184, in contrast, may analyze the user’s tone to detect (1) boredom, and interpret “okay” as a request to shorten the information provided, (2) hesitation or confusion, and interpret “okay” as a request for additional information, (3) impatience, and interpret “okay” as a request to end the interaction.
[0048] The text augmentation modules’ 184 integration with other devices and modules may not be linear. Rather, context from the virtual reality system may be used in one or more steps of speech interpretation. For example, in a multi-tum conversation (such as the conversation described above), at each turn of a user input, the speech-to-text modules may be used to generate the most accurate interpretation of the user’ s input, and the non-verbal input processing module 182 may be used to inject more context. Further, the Al’s conversation management modules 186 may be integrated with the text augmentation modules 184 to generate the output used in single or multi-turn conversations.
[0049] Once the Al has augmented the text by considering the current state of the virtual space in relation to the user, then a conversation may be carried out. The conversation management modules 186 may classify the spoken text into different
categories to facilitate the open-ended conversation. First the conversation management modules 186 may determine if a statement is meant to initiate a new conversation or one that continues an existing conversation. If the user is detected to initiate a new conversation, then the conversation management modules 186 may classify the result among categories. A first category may include user comments that may not necessarily require a strong response. For example, if a user states “this is really cool”, the conversation management modules 186 may render the virtual personification to respond with a more descriptive or expressive response in relation to what was remarked as being cool. Alternatively, the virtual personification may not respond. A second category may include user questions that may be in relation to the current scene. A third category may be user questions that are in relation to the nonvirtualized world (i.e., reality).
[0050] In the second and third categories, the conversation management modules 186 may facilitate an answer to the question via the virtual personification. In the second category of a question being detected in relation to the virtual world, the system may then proceed down to one of two or more paths. The conversation management modules 186 may first attempt to use information in pre-recorded data to answer the question. For example, during a user interaction with a virtual Gordon Ramsey on making a grilled salmon, a user may ask about the use of an ingredient not in the current recipe. The conversation management modules 186 may retrieve footage from another video where Gordon Ramsey uses that ingredient and may render the virtual Gordon Ramsey to modify the current recipe to include that ingredient.
[0051] If no pre-recorded data exists, then the conversation management modules 186 request the response generation modules 188 to analyze additional data (such as data on Gordon Ramsey’s presentation of a similar alternative ingredient or based on other chefs or known cooking information) to generate new behavior, speak, actions, response for the virtual Gordon Ramsey (such as the output of an opinion that the ingredient may not be desirable, or the rendering of Gordon Ramsey adding the ingredient to the recipe using Gordon Ramsey’s voice and behavior). If the user is in an existing conversation, then the conversation management modules 186 may proceed with the same approach as in the previous section, but with the added impetus of considering the context. Using context and past conversation details in the Al system provides a more realistic user interaction and avoid the virtual representation from repeating themselves or providing the same response.
[0052] After the response is provided, the conversation management modules 186 may account for the likelihood that the user will continue to ask questions or follow ups to the previous response. The conversation management modules 186 may use this information to better carry out the next algorithm by utilizing this additional information.
[0053] The audio rendering and updating modules 190 may be used, both independently and/or in conjunction with other modules 188, to output audio. The audio rendering and updating modules 190 may ensure, for example, that the proper response to a user question to virtual Gordon Ramsey may indeed be in Gordon Ramsey’s voice, tone, and speech pattern. For example, based on analysis of Gordon Ramsey’s past audio files, the audio rendering and updating modules 190 may
determine Gordon Ramsey frequently accompanies his speech with an exasperated sigh. As a result, the audio rendering and updating modules 190 may cause the virtual personification of Gordon Ramsey to output the same exasperated sigh in its user interactions. Additional analysis may be performed to customize speech of the virtual personification in the following areas: volume (such as a person who frequently speaks in raised voice), accent, tone (such as a person who likes to emphasize certain words), speech pattern, etc.
[0054] The audio rendering and updating modules 190 may also supplement the user experience with the appropriate background noise or music, which may be updated based on user action. For example, as virtual Gordon Ramsey guides the user through the experience of baking a cake, at one-point virtual Gordon Ramsey may show the user how to turn on a cake mixer. In addition to the visual rendering of a cake mixer that is currently in use, the audio rendering and updating modules 190 may update the ambient noise to a background noise of an operating cake mixer. At a later point, then virtual Gordon Ramsey turns the cake mixer off, the audio rendering and updating modules 190 may update the background noise of an operating cake mixer back to ambient noise. This would be true for any and all sounds in a virtual environment.
[0055] The virtual personification rendering module 192 may be used, in conjunction with other modules, to output the visual representation of the virtual personification. For example, the virtual personification rendering module 192 may be used to generate the virtual Gordon Ramsey, which may include a visual representation of one or more of the following: Gordon Ramsey’s face, body, and clothing. The virtual personification rendering module 192 processes existing image or video of the person
or item being personified and maps and creates new video from existing video and is also able to create new video or image from prior image and video to create a virtual personification that is doing or saying things which the original person never did or said.
[0056] The virtual personification prediction modules 194 may be used, in conjunction with other modules, to predict behavior and responses by the virtual personification. Particularly in cases where no existing pre-recording data exists, and the Al must generate new responses/behavior, the virtual personification rendering module 192 may be used to render predictions on the virtual personifications’ responses, which may include one or more of the following: facial expressions, body movements, gestures, etc.
[0057] The integration modules 196 may be used to integrate the one or more modules described above, so they may operate in sync. For example, the response generation modules 188 may be integrated with the audio rendering and updating module to accurately output a virtual personification’s speech using the proper voice and tone, which may, in turn, be integrated with the virtual personification rendering modules 192 and the virtual personification prediction modules 194 to ensure the coherence between the output of voice, facial expression, and body movements. In one embodiment, additional customization may be added to the virtual personification through additional modules. For example, additional modules may be added to analyze a person’s temperament or personality and incorporate them into the virtual personification of that person.
[0058] Conventional virtual reality systems, and conversational Al systems, may not have the ability to hold multi-tum conversations. For example, following the user’s
request to add a first ingredient to a recipe, Gordon Ramsey may suggest the use of a second ingredient instead. The user may ask, “can you show me what that would look like”. Conventional virtual reality systems may not understand what “that” is referring to, or incorrectly identify the user is still looking for a display of the first ingredient, or not have the requested information. In addition, the Al system would generate an image or video with what that added ingredient would look like even if such video or image footage never previously existed. In contrast, through the integration of speech-to-text modules 180, conversation management modules 186, and logic modules 176 such as natural language understanding modules, fuzzy logic modules, the Al would understand the user 100 A, in continuation of the previous exchange, is referring to the second ingredient and may thus provide output that is truly responsive to the user’s second input.
[0059] In addition to the robust speech interpretation and multi-conversation feature, the Al presents two other main improvements on conventional systems - unlimited data and analysis, and dynamic and predictive rendering. Using the previous example, a conventional rendering of a virtual Gordon Ramsey may be limited to existing video footage of Gordon Ramsey (such as, for example, a deep fake using Gordon Ramsey’s face, or the output of a pre-recording of Gordon Ramsey). Further, a conventional software application used to render the virtual Gordon Ramsey may be limited to existing data in the local device. The Al, in contrast, may query any database accessible over the network to retrieve additional pre-recorded data, or generate rendering of virtual Gordon Ramsey in never-before-seen situations based on analysis of existing footage. Thus, not only is the entire universe of information available on the internet
accessible by the user through Al, but additional information not currently existing on the internet may be predicted and generated.
[0060] Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al 168B operating on a separate user device 190 such as a smartphone, a tablet, a personal computer, etc. Such separate user devices 190 may establish direct communication to the communication module 164 in the VR device 104B and/or a remote Al 116B housing additional Al modules or may communicate with the VR device 104B and remote Al 116B via the network 112B. The various Al components 172-196 illustrated in Figure 1A and discussed above may be stored in the local Al 168B and/or the remote Al 116B. The external input devices 108B and the various VR device components 120B-164B may interact with each other and with the local and remote AIs 168B, 116B in similar fashion as described in Figure
IA.
[0061] It is contemplated that in any embodiment, including but not limited to 1A and
IB, any one or more of the Al modules 172-196 may be included in the local Al 168 and/or the remote Al 116. In one embodiment, all Al modules 172-196 may be located on a local Al 168 operating in the VR device 104 such that no remote Al 116 may be necessary. Alternatively, all Al modules 172-196 may be located in a remote Al 116. In preferred embodiments, most or all Al modules 172-196 are in a remote Al 116 such that the Al may be integrated with any VR device 104, including VR devices 104 with no built-in local Al 168 A. Such integration may be achieved using Al layers to power cross platform Al services, which is discussed in more details in U.S. Application 17/218,021.
[0062] While VR devices may be the preferred device to implement the virtual personification Al system, it is intended that the virtual personification Al system may be used on any computing devices capable of performing user interaction. For example, the virtual personification Al system may be implemented on a device capable of performing AR display, such that the virtual personification may be output via AR technology. Similarly, the virtual personification Al system may be implemented on smartphones, tablets, personal computers, laptop devices, etc., where the virtual personification may be a 2-dimensional output on a display screen. In one embodiment, the virtual personification may be in audio-only mode, for implementation on a peripheral device (such as a vehicle with CarPlay) or wearable devices (such as smartwatches, smart rings, glasses, hearing aids, headphones, earbuds, etc.), home devices (such as home speakers, security cameras, webcams, televisions, projection screen monitors, sound bars, etc.), or any other electronic devices.
[0063] Figure 2 illustrates an exemplary environment of use of the virtual personification Al system. In Figure 2, a user 200 may interact with the following devices, which may, as discussed above, be capable of implementing the virtual personification Al system: a VR device 204 (as illustrated in Figure 1), an AR device 208, or any other computing device 212 (such as a computer, a smart TV, or a smartphone). These devices 204-212 may then implement the Al modules 220 (which are separately illustrated as 172-196 in Figure 1A and discussed above). Such implementation may be, as discussed above, locally, remotely, or both.
[0064] In one embodiment, the Al modules 220 may be integrated with third-party tools such as virtual representation modules 216 and audio data modules 224. Virtual
representation modules 216 may be any additional tools used to generate virtual personifications and virtual environments. Audio data modules 224 may be additional tools used to generate audio for virtual personifications and virtual environments.
[0065] The Al modules 220 and its integrated third-party tools 216, 224 may be in direct communication, or communicate via a network 228 to access programs, servers, and/or databases stored in a cloud 232 and/or cloud-based servers, as well as other devices 236, which may in turn be connected to their respective databases 240. In one embodiment, the Al modules 220 may access the third-party tools 216, 224 remotely, such as via the network 228. The Al modules 220 may thus access resources from all connected programs, devices, servers, and/or databases.
[0066] Figure 3 illustrates an example embodiment of a mobile device on which a solution generator may operate, also referred to as a user device which may or may not be mobile. This is but one possible mobile device configuration and as such, it is contemplated that one of ordinary skill in the art may differently configure the mobile device. The mobile device 300 may comprise any type of mobile communication device capable of performing as described below. The mobile device may comprise a Personal Digital Assistant (“PDA”), cellular telephone, smart phone, tablet PC, wireless electronic pad, an loT device, a “wearable” electronic device or any other computing device.
[0067] In this example embodiment, the mobile device 300 is configured with an outer housing 304 designed to protect and contain the components described below. Within the housing 304 is a processor 308 and a first and second bus 312A, 312B (collectively 312). The processor 308 communicates over the buses 312 with the other components
of the mobile device 300. The processor 308 may comprise any type processor or controller capable of performing as described herein. The processor 308 may comprise a general purpose processor, ASIC, ARM, DSP, controller, or any other type processing device. The processor 308 and other elements of the mobile device 300 receive power from a battery 320 or other power source. An electrical interface 324 provides one or more electrical ports to electrically interface with the mobile device, such as with a second electronic device, computer, a medical device, or a power supply/charging device. The interface 324 may comprise any type electrical interface or connector format.
[0068] One or more memories 310 are part of the mobile device 300 for storage of machine readable code for execution on the processor 308 and for storage of data, such as image, audio, user, location, accelerometer, or any other type of data. The memory 310 may comprise RAM, ROM, flash memory, optical memory, or micro-drive memory. The machine readable code (software modules and/or routines) as described herein is non-transitory.
[0069] As part of this embodiment, the processor 308 connects to a user interface 316. The user interface 316 may comprise any system or device configured to accept user input to control the mobile device. The user interface 316 may comprise one or more of the following: microphone, keyboard, roller ball, buttons, wheels, pointer key, touch pad, and touch screen. Also provide is a touch screen controller 330 which interfaces through the bus 312 and connects to a display 328.
[0070] The display comprises any type display screen configured to display visual information to the user. The screen may comprise a LED, LCD, thin film transistor
screen, OEL CSTN (color super twisted nematic), TFT (thin film transistor), TFD (thin film diode), OEED (organic light-emitting diode), AMOEED display (active-matrix organic light-emitting diode), capacitive touch screen, resistive touch screen or any combination of such technologies. The display 328 receives signals from the processor 308, and these signals are translated by the display into text and images as is understood in the art. The display 328 may further comprise a display processor (not shown) or controller that interfaces with the processor 308. The touch screen controller 330 may comprise a module configured to receive signals from a touch screen which is overlaid on the display 328.
[0071] Also part of this exemplary mobile device is a speaker 334 and microphone 338. The speaker 334 and microphone 338 may be controlled by the processor 308. The microphone 338 is configured to receive and convert audio signals to electrical signals based on processor 308 control. Likewise, the processor 308 may activate the speaker 334 to generate audio signals. These devices operate as is understood in the art and as such, are not described in detail herein.
[0072] Also connected to one or more of the buses 312 is a first wireless transceiver 340 and a second wireless transceiver 344, each of which connect to respective antennas 348, 352. The first and second transceivers 340, 344 are configured to receive incoming signals from a remote transmitter and perform analog front-end processing on the signals to generate analog baseband signals. The incoming signal may be further processed by conversion to a digital format, such as by an analog to digital converter, for subsequent processing by the processor 308. Likewise, the first and second transceivers 340, 344 are configured to receive outgoing signals from the processor
308, or another component of the mobile device 308, and up convert these signals from baseband to RF frequency for transmission over the respective antenna 348, 352. Although shown with a first wireless transceiver 340 and a second wireless transceiver 344, it is contemplated that the mobile device 300 may have only one or two such systems, or more transceivers. For example, some devices are tri-band or quad-band capable, or have Bluetooth?, NFC, or other communication capability.
[0073] It is contemplated that the mobile device 300, hence the first wireless transceiver 340 and a second wireless transceiver 344, may be configured to operate according to any presently existing or future developed wireless standard including, but not limited to, Bluetooth, WI-FI such as IEEE 802.11 a,b,g,n, wireless LAN, WMAN, broadband fixed access, WiMAX, any cellular technology including CDMA, GSM, EDGE, 3G, 4G, 5G, TDMA, AMPS, FRS, GMRS, citizen band radio, VHF, AM, FM, and wireless USB.
[0074] Also part of the mobile device 300 is one or more systems connected to the second bus 312B which also interfaces with the processor 308. These devices include a global positioning system (GPS) module 360 with associated antenna 362. The GPS module 360 is capable of receiving and processing signals from satellites or other transponders to generate data regarding the location, direction of travel, and speed of the GPS module 360. GPS is generally understood in the art and hence not described in detail herein. A gyroscope 364 connects to the bus 312B to generate and provide orientation data regarding the orientation of the mobile device 300. A magnetometer 368 is provided to supply directional information to the mobile device 300. An accelerometer 372 connects to the bus 312B to provide information or data regarding
shocks or forces experienced by the mobile device. In one configuration, the accelerometer 372 and gyroscope 364 generate and provide data to the processor 308 to indicate a movement path and orientation of the mobile device 300.
[0075] One or more cameras (still, video, or both) 376 are provided to capture image data for storage in the memory 310 and/or for possible transmission over a wireless or wired link, or for viewing at a later time. The one or more cameras 376 may be configured to detect an image using visible light and/or near-infrared light. The cameras 376 may also be configured to utilize image intensification, active illumination, or thermal vision to obtain images in dark environments. The processor 308 may process machine-readable code that is stored on the memory to perform the functions described herein.
[0076] A flasher and/or flashlight 380, such as an LED light, are provided and are processor controllable. The flasher or flashlight 380 may serve as a strobe or traditional flashlight. The flasher or flashlight 380 may also be configured to emit near- infrared light. A power management module 384 interfaces with or monitors the battery 320 to manage power consumption, control battery charging, and provide supply voltages to the various devices which may require different power requirements.
[0077] Figure 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment. Computing device 400 is intended to represent various forms of digital computers, such as smartphones, tablets, kiosks, laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Computing device 400 is intended to represent various forms of mobile
devices, such as personal digital assistants, cellular telephones, smart phones, and other similar computing devices. The components shown, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit the implementations described and/or claimed in this document.
[0078] Computing device 400 includes a processor 402, memory 404, a storage device 406, a high-speed interface or controller 408 connecting to memory 404 and high-speed expansion ports 410, and a low-speed interface or controller 412 connecting to low- speed bus 414 and storage device 406. Each of the components 402, 404, 406, 408, 410, and 412, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 402 can process instructions for execution within the computing device 400, including instructions stored in the memory 404 or on the storage device 406, to display graphical information for a GUI on an external input/output device, such as display 416 coupled to high-speed controller 408. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 400 may be connected, with each device providing portions of the necessary operations (e.g., a server bank, a group of blade servers, or a multi-processor system).
[0079] The memory 404 stores information within the computing device 400. In one implementation, the memory 404 is a volatile memory unit or units. In another implementation, the memory 404 is a non-volatile memory unit or units. The memory 404 may also be another form of computer-readable medium, such as a magnetic or optical disk.
[0080] The storage device 406 is capable of providing mass storage for the computing device 400. In one implementation, the storage device 406 may be or contain a computer-readable medium, such as a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid-state memory device, or an array of devices, including devices in a storage area network or other configurations. A computer program product can be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 404, the storage device 406, or memory on processor 402.
[0081] The high-speed controller 408 manages bandwidth- intensive operations for the computing device 400, while the low-speed controller 412 manages lower bandwidthintensive operations. Such allocation of functions is exemplary only. In one implementation, the high-speed controller 408 is coupled to memory 404, display 416 (i.e., through a graphics processor or accelerator), and to high-speed expansion ports 410, which may accept various expansion cards (not shown). In this representative implementation, low-speed controller 412 is coupled to storage device 406 and low- speed bus 414. The low-speed bus 414, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router (i.e., through a network adapter).
[0082] The computing device 400 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 420,
or multiple times in a group of such servers. It may also be implemented as part of a rack server system 424. In addition, it may be implemented in a personal computer such as a laptop computer 422. Alternatively, components from computing device 400 may be combined with other components in a mobile device (not shown), such as device 450. Each of such devices may contain one or more computing devices 400, 450, and an entire system may be made up of multiple computing devices 400, 450 communicating with each other.
[0083] Computing device 450 includes a processor 452, memory 464, an input/output device such as a display 454, a communication interface 466, and a transceiver 468, among other components. The computing device 450 may also be provided with a storage device, such as a micro-drive or other device(s), to provide additional storage. Each of the components 452, 464, 454, 466, and 468, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
[0084] The processor 452 can execute instructions within the computing device 450, including instructions stored in the memory 464. The processor 452 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor may provide, for example, for coordination of the other components of the computing device 450, such as control of user interfaces, applications run by the computing device 450, and wireless communication by the computing device 450.
[0085] Processor 452 may communicate with a user through control interface 458 and display interface 456 coupled to a display 454. For example, the display 454 may be a
TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 456 may comprise appropriate circuitry for driving the display 454 to present graphical and other information to a user. The control interface 458 may receive commands from a user and convert them for submission to the processor 452. In addition, an external interface 462 may be provided in communication with processor 452, to enable near area communication of computing device 450 with other devices . In some implementations external interface 462 may provide for wired communication, or in other implementations, for wireless communication, whilst multiple interfaces may also be used.
[0086] The memory 464 stores information within the computing device 450. The memory 464 can be implemented as one or more of a computer-readable medium or media, a volatile or a non-volatile memory unit or units. Expansion memory 474 may also be provided and connected to the computing device 450 through expansion interface 472, which may include, for example, a SIMM (Single In Line Memory Module) card interface. Such expansion memory 474 may provide extra storage space and/or may also store applications or other information for the computing device 450. Specifically, expansion memory 474 may include instructions to carry out or supplement the processes described above and may also include secure information. Thus, for example, expansion memory 474 may be provided as a security module for computing device 450 and may be programmed with instructions that permit secure use of the same. In addition, secure applications may be provided via the DIMM cards, along with additional information, such as placing identifying information on the DIMM card in a non-hackable manner.
[0087] The memory may include for example, flash memory and/or NVRAM memory, as discussed below. In one implementation, a computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 464, expansion memory 474, or memory on processor 452, that may be received for example, over transceiver 468 or external interface 462.
[0088] The computing device 450 may communicate wirelessly through communication interface 466, which may include digital signal processing circuitry where necessary. Communication interface 466 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur for example, through a radio-frequency transceiver 468. In addition, short-range communication may occur, such as using a Bluetooth, Wi-Fi, or other such transceiver (not shown). In addition, GPS (Global Positioning system) receiver module 470 may provide additional navigation- and location-related wireless data to the computing device 450, which may be used as appropriate by applications running on the computing device 450.
[0089] The computing device 450 may also communicate audibly using audio codec 460, which may receive spoken information from a user and convert it to usable digital information. Audio codec 460 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the computing device 450. Such sound may include audio from voice telephone calls, recorded audio (e.g., voice messages, music
files, etc.), and may also further include audio generated by applications operating on the computing device 450.
[0090] The computing device 450 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 460. It may also be implemented as part of a smartphone 482, personal digital assistant, a computer tablet, or other similar mobile device.
[0091] Thus, various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, especially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include applications in one or more computer programs that are executable and/or interpretable on a programmable system, including at least one programmable processor which may be special or of general purpose, coupled to receive data and instructions, to and from a storage system, at least one input device, and at least one output device.
[0092] These computer programs (also known as programs software, software applications or code) include machine instructions for a programmable processor and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine- readable medium” and “computer-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable
signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
[0093] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device LCD (liquid crystal display) monitor, LED, or any other flat panel display, for displaying information to the user, a keyboard, and a pointing device (e.g., mouse, joystick, trackball, or similar device) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well, for example; feedback provided to the user can be any form of sensory feedback (e.g., visual, auditory, or tactile); and input from the user can be received in any form, including acoustic, speech, or tactile input.
[0094] The systems and techniques described here may be implemented in a computing system (e.g., computing device 400 and/or 450) that includes a back end component, or that includes a middleware component (e.g., application server), or that includes a frontend component such as a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the systems and techniques described herein, or any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
[0095] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication
network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[0096] It will be appreciated that the virtual personification Al system may be used to implement many possible applications. For example, a conventional weather application may display weather searchable by pre-determined parameters (such as location and date). In contrast, the Al system may output a virtual personification of a popular on-camera meteorologist (such as Jim Cantore) to not only provide weather based on pre-determined parameters, but also to further interact with the user. For example, a user may first request the current weather and a forecast for Hawaii, and then ask, “what clothes should I pack for my upcoming vacation”? A conventional weather application will not understand the user question because it may not remember the context (weather in Hawaii). The virtual personification Al system, on the other hand, may not only understand the context of the user question and accurately determine the user’s true request - a retrieval of the user’s calendar to determine the accurate date range and/or location for the upcoming vacation, a projection of the weather during that date range, an analysis of proper attire given the weather, personal data (such as user preferences on clothing items), and location (which may take into account additional factors such as humidity, altitude, and local culture). Further, the Al may be capable of presenting the proper response to the user request using the virtual personification of Jim Cantore in a conversational format, even though that person has not previously answered that question or provided that particular response in the past.
[0097] It is contemplated that pre-recorded data may be analyzed, and virtual personifications may be generated, for any person(s), not just famous ones. For example, a user may submit family videos of a great-grandfather who has passed away, and the virtual personification system may render a virtual personification of the greatgrandfather, who may interact with future generations. This concept may be applied to any person living or passed away. In addition, the system may create virtual personifications which have an appearance different to anyone alive or previously alive.
[0098] In one embodiment, the Al may supplement missing footage with its own. In the example where a user asks the virtual personification of Jillian Michaels, a fitness coach, for a standing core exercise, but no recording of Jillian Michaels performing such an exercise exists, then new rendering may be generated using Jillian Michaels’ footage performing other, but similar actions, or other people performing the requested exercise, but rendered to appear as Jillian Michaels. The new rendering may be generated using deepfake or other technology to combine Jillian Michaels’ footage with generic footage of another person performing a standing core exercise. As yet another alternative, the Al may provide a set of default footage (such as a default virtual model performing standing core exercises) and superimpose whatever footage of Jillian Michaels’ may be available on the default model. It is contemplated that any other type of technology may be used to generate new rendering where no pre-recorded data exist.
[0099] Another possible expansion of the virtual personification Al system may be to generate and render entirely imaginary characters. For example, by combining a new character design for a two-headed animal with default models or existing footage of a dinosaur, a virtual personification of a new two-headed dinosaur may be generated, and
its behavior may be based on analysis of a wolf, or it may interact with the user with the voice of an old man. Thus, virtual personification may be customized based on both pre-recorded data and user preferences. It is contemplated that the personification is not limited to people but other things (real or not real), such as but not limited to animals, robots, cars, birds, fish, extinct species, alien creatures, created items or beings, or any other item.
[0100] The virtual personification Al system’s ability to generate any type of virtual personification and its versatility of customization enables broad application to all types of technology and environment. One exemplary use may be education, where classes and/or tutorials may be provided to students with virtual personification of an instructor on any subject, which may automate some or all of a student’s classroom experience without compromising the student’s personal interaction (such as to ask questions). As an improvement to in-person experiences, the virtual instructor may draw information from existing knowledge or databases, thereby providing answers a live instructor may not have. The class may be taught by a virtual representation of someone famous, such as Albert Einstein, Sandra Day O’Connor, or Alexander Graham Bell.
[0101] Another exemplary use may be training, which may include standardized training for professional purposes (such as customized professional training of airplane pilots using a virtual personification of a flight instructor and the cockpit), or training for hobbies or information learning (such as cooking with a virtual personification of Gordon Ramsey). In the case of an airplane, the virtual environment may react to different actions by the user, such as use of certain controls, to provide realistic training.
[0102] Similarly, the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training. For example, a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an ambulance to arrive. The caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”? Answers may be pulled from a database and the personification may actually perform the answer to the question so the user can see how to perform the medical training. This concept can be applied to teaching tools for medical procedures such that the virtual representations of the best doctors in the world can be created to show how to perform a procedure and dynamically respond to any question for personalized teaching.
[0103] Yet another exemplary use may be entertainment, such as allowing users to interact with famous people from the past or an imaginary character in a conversational setting. This allows people to have personal interactions with people from history and interact using the Al databases such that the virtual representation can answer any question or perform any action in real time during user interaction. These famous people from which the virtual representation is created could be any person, famous for any reason. A database may be created for each person so that the Al system can accurately create visual, audio, and knowledge representations.
[0104] Similarly, the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training. For example, a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an
ambulance to arrive. The caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”?
[0105] In future expansions of the technology, it may also be possible to stimulate virtual personification without user input. For example, an entire lecture series may be generated using virtual personification of an instructor and combining footage of lecture recordings of past real-life lectures. Anticipatory answers may be provided by analysis of recording of past student questions, thereby eliminating the need for further student interaction (which may still be provided as an additional feature).
[0106] Yet another example may be a simulated room where two virtual personifications may interact with each other instead of (or in addition to) interaction with users. For example, a user may wish to simulate a philosophical debate between Socrates and Kant. In one example, the virtual room may be expanded to include entire virtual worlds and large populations of virtual characters. The user can learn from seeing how two or more virtual representation interact, such as in professional environments, military situation, formal engagements, or social interaction.
[0107] While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of this invention. In addition, the various features, elements, and embodiments described herein may be claimed or combined in any combination or arrangement.
Claims
1. A method to generate and update virtual personification using artificial intelligence comprising the steps of: receiving data associated with a person, the data comprising one or more of the following: text files, audio files, image files, and video files; rendering a virtual personification of the person and outputting the virtual personification to a user; receiving and interpreting a user input to generate a user request; updating the virtual personification in response to the user request, the update comprising one or more of the following: responsive to the user request, generating an audio output using the text files and the audio files of the person; and responsive to the user request, generating a video output using the image files and the video files of the person, wherein the audio output and the video output is presented to the user by the virtual personification and the audio output and the video output presented by the virtual personification has not previously occurred by the person or thing represented by the virtual personification.
2. The method of claim 1 wherein the virtual personification is of a person, either living or deceased.
3. The method of claim 1 wherein the virtual personification comprises an audio output and video output is presented in a virtual environment of a type associated with the virtual personification.
4. The method of claim 1 wherein the virtual personification comprises a representation of a non-living item.
5. The method of claim 1 wherein the method is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
6. The method of claim 5, wherein the remote artificial intelligence module is a computing device with a processor and memory storing machine readable code configured to: receive the question or request from the user via the virtual reality device; process the question or request to derive a meaning; perform one or more searches for answers to the question or request in databases unrelated to the virtual personification; upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request; and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
7. The method of claim 1 further comprising tracking a hand position of a user with one or more user hand position tracking devices to determine what the user is pointing at in the virtual environment.
8. The method of claim 1 wherein the generated response to the question or request uses artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which does not provide a direct answer to the question or request.
9. A system for presenting an interactive, artificial intelligence assisted, virtual personification to a user comprising:
10. A virtual reality device configured to have at least a portion be worn by the user comprising: a wearable screen configured for viewing by a user; one or more speakers configured to provide audio output to the user; a microphone configured to receive audio input from the user; one or more external input devices configured to receive input from the user; a communication module configured to configured to communicate over a computer network or Internet; a processor configured to execute machine readable code; a memory configured to store the machine readable code, the machine readable code configured to: present a virtual environment on the wearable screen and through the one or more speakers to the user; present, to the user on the wearable screen and through the one or more
47
speakers, a virtual personification of a person currently living or deceased, in the virtual environment; receive a question or request from the user regarding one or more aspects of the virtual environment or the virtual personification; generate a generated response to the question or request from the user which includes generating video content and audio content which did not previously exist; present the generated response to the user on the wearable screen and through the one or more speakers in response to question or request from the user.
10. The system of claim 9 wherein the machine readable code is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
11. The system of claim 10 wherein the remote artificial intelligence module is a computing device with memory and processor such that the memory stores machine readable code configured to: receive the question or request from the user via the virtual reality device; process the question or request to derive a meaning; perform one or more searches for answers to the question or request in databases unrelated to the virtual personification; upon locating an answer to the question or request, generate data that represents the virtual personification answering the question or request; and
transmit the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
12. The system of claim 9 further comprising one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment.
13. The system of claim 9 wherein the input from the user comprises an audio input or an input from the user to the one or more external input devices.
14. The system of claim 9 wherein generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both of the person represented by the virtual personification.
15. The system of claim 9 wherein the generated response to the question or request is generated using artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which does not provide a direct answer to the question or request.
16. A method for presenting an interactive experience with a virtual personification using a screen, speakers, and microphone of a user computing device, the method comprising: presenting a virtual environment on the wearable screen and through the one or more speakers to the user and presenting the virtual personification in the virtual environment; receiving input from the user comprising a question, a request, or subject regarding one or more aspects of the virtual environment, the virtual personification, or the actions of the virtual personification in the virtual environment; sending a request for a response to the input from the user to an Al computing device that is remote from the user computing device; with the Al computing device, creating a response based on pre-existing content stored in one or databases which is processed to create the generated response; transmitting the generated response to the user computing device; at the user computing device, based on the generated response from the Al computing device, generating video content and audio content which did not previously exist; and presenting video content and audio content which did not previously exist to the user.
17. The system of claim 16 wherein the Al computing device is a computing device with memory and processor such that the memory stores machine readable code configured to: receiving the input from the user computing device; processing the input from the user to derive a meaning;
based on the meaning, performing one or more searches for answers to the input from the user in databases unrelated to the virtual personification; upon locating a response to the input from the user, generating data that represents the virtual personification answering the question or request; and transmitting the data, that represents the virtual personification responding to the input from the user to the user computing device.
18. The system of claim 16 further monitoring one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment and interpreting the pointing as the input from the user.
19. The system of claim 16 wherein the input from the user comprises an audio input or an input from the user to the one or more external input devices.
20. The system of claim 16 wherein generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both of a person represented by the virtual personification to generate new content.
51
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263298582P | 2025-08-06 | 2025-08-06 | |
US63/298,582 | 2025-08-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023137078A1 true WO2023137078A1 (en) | 2025-08-06 |
Family
ID=87162208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/010624 WO2023137078A1 (en) | 2025-08-06 | 2025-08-06 | Method and system for virtual intelligence user interaction |
Country Status (2)
Country | Link |
---|---|
US (1) | US12346994B2 (en) |
WO (1) | WO2023137078A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20240112145A1 (en) * | 2025-08-06 | 2025-08-06 | EasyLlama Inc. | Dynamic content personalization within a training platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190050686A1 (en) * | 2025-08-06 | 2025-08-06 | Intel Corporation | Methods and apparatus to add common sense reasoning to artificial intelligence in the context of human machine interfaces |
WO2020136615A1 (en) * | 2025-08-06 | 2025-08-06 | Pankaj Uday Raut | A system and a method for generating a head mounted device based artificial intelligence (ai) bot |
WO2020247590A1 (en) * | 2025-08-06 | 2025-08-06 | Artie, Inc. | Multi-modal model for dynamically responsive virtual characters |
US11107465B2 (en) * | 2025-08-06 | 2025-08-06 | Storyfile, Llc | Natural conversation storytelling system |
Family Cites Families (111)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7117032B2 (en) | 2025-08-06 | 2025-08-06 | Quantum Intech, Inc. | Systems and methods for facilitating physiological coherence using respiration training |
US20020032591A1 (en) | 2025-08-06 | 2025-08-06 | Agentai, Inc. | Service request processing performed by artificial intelligence systems in conjunctiion with human intervention |
KR20020030545A (en) | 2025-08-06 | 2025-08-06 | ? ???? ? | Automatic answer and search method - based on artificial intelligence and natural languane process technology - for natural and sentencial questions. |
US6996064B2 (en) | 2025-08-06 | 2025-08-06 | International Business Machines Corporation | System and method for determining network throughput speed and streaming utilization |
US7831564B1 (en) | 2025-08-06 | 2025-08-06 | Symantec Operating Corporation | Method and system of generating a point-in-time image of at least a portion of a database |
US20070043736A1 (en) | 2025-08-06 | 2025-08-06 | Microsoft Corporation | Smart find |
KR100657331B1 (en) | 2025-08-06 | 2025-08-06 | ???????? | An image forming apparatus employing a multi processor and an image forming method using the same |
US8600977B2 (en) | 2025-08-06 | 2025-08-06 | Oracle International Corporation | Automatic recognition and capture of SQL execution plans |
KR20100035391A (en) | 2025-08-06 | 2025-08-06 | ????????? | Valve module for changing flow paths and soft water apparatu |
KR101042515B1 (en) | 2025-08-06 | 2025-08-06 | ???? ???? | Information retrieval method and information provision method based on user's intention |
US20100205222A1 (en) | 2025-08-06 | 2025-08-06 | Tom Gajdos | Music profiling |
US8326637B2 (en) | 2025-08-06 | 2025-08-06 | Voicebox Technologies, Inc. | System and method for processing multi-modal device interactions in a natural language voice services environment |
TWI432347B (en) | 2025-08-06 | 2025-08-06 | Wistron Corp | Holder device which could adjust positions automatically, and the combination of the holder device and the electronic device |
US8954431B2 (en) | 2025-08-06 | 2025-08-06 | Xerox Corporation | Smart collaborative brainstorming tool |
US9009041B2 (en) | 2025-08-06 | 2025-08-06 | Nuance Communications, Inc. | Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data |
US8762156B2 (en) | 2025-08-06 | 2025-08-06 | Apple Inc. | Speech recognition repair using contextual information |
US9542956B1 (en) | 2025-08-06 | 2025-08-06 | Interactive Voice, Inc. | Systems and methods for responding to human spoken audio |
US9280610B2 (en) | 2025-08-06 | 2025-08-06 | Apple Inc. | Crowd sourcing information to fulfill user requests |
KR101399472B1 (en) | 2025-08-06 | 2025-08-06 | (?)????? | Method and apparatus for rendering processing by using multiple processings |
EP3865056A1 (en) | 2025-08-06 | 2025-08-06 | InteraXon Inc. | Systems and methods for collecting, analyzing, and sharing bio-signal and non-bio-signal data |
KR20140078169A (en) | 2025-08-06 | 2025-08-06 | ???????? | Imaging apparatus, magnetic resonance imaging and method for controlling the imaging apparatus or the magnetic resonance imaging apparatus |
US20150351655A1 (en) | 2025-08-06 | 2025-08-06 | Interaxon Inc. | Adaptive brain training computer system and method |
CN113470640B (en) | 2025-08-06 | 2025-08-06 | 苹果公司 | Voice triggers for digital assistants |
US9172747B2 (en) | 2025-08-06 | 2025-08-06 | Artificial Solutions Iberia SL | System and methods for virtual assistant networks |
CN110096712B (en) | 2025-08-06 | 2025-08-06 | 苹果公司 | User training through intelligent digital assistant |
US9058805B2 (en) | 2025-08-06 | 2025-08-06 | Google Inc. | Multiple recognizer speech recognition |
US10390732B2 (en) | 2025-08-06 | 2025-08-06 | Digital Ally, Inc. | Breath analyzer, system, and computer program for authenticating, preserving, and presenting breath analysis data |
CN105637445B (en) | 2025-08-06 | 2025-08-06 | 奥誓公司 | System and method for providing a context-based user interface |
US9721570B1 (en) | 2025-08-06 | 2025-08-06 | Amazon Technologies, Inc. | Outcome-oriented dialogs on a speech recognition platform |
TWM483638U (en) | 2025-08-06 | 2025-08-06 | Taer Innovation Co Ltd | Stand |
US20150288857A1 (en) | 2025-08-06 | 2025-08-06 | Microsoft Corporation | Mount that facilitates positioning and orienting a mobile computing device |
US9830556B2 (en) | 2025-08-06 | 2025-08-06 | Excalibur Ip, Llc | Synthetic question formulation |
US9727798B2 (en) * | 2025-08-06 | 2025-08-06 | Acrovirt, LLC | Generating and using a predictive virtual personification |
US9607102B2 (en) | 2025-08-06 | 2025-08-06 | Nuance Communications, Inc. | Task switching in dialogue processing |
US10254928B1 (en) | 2025-08-06 | 2025-08-06 | Amazon Technologies, Inc. | Contextual card generation and delivery |
US9774682B2 (en) | 2025-08-06 | 2025-08-06 | International Business Machines Corporation | Parallel data streaming between cloud-based applications and massively parallel systems |
US10756963B2 (en) | 2025-08-06 | 2025-08-06 | Pulzze Systems, Inc. | System and method for developing run time self-modifying interaction solution through configuration |
US10395021B2 (en) | 2025-08-06 | 2025-08-06 | Mesh Candy, Inc. | Security and identification system and method using data collection and messaging over a dynamic mesh network with multiple protocols |
US10582011B2 (en) | 2025-08-06 | 2025-08-06 | Samsung Electronics Co., Ltd. | Application cards based on contextual data |
US10888270B2 (en) | 2025-08-06 | 2025-08-06 | Avishai Abrahami | Cognitive state alteration system integrating multiple feedback technologies |
US10709371B2 (en) | 2025-08-06 | 2025-08-06 | WellBrain, Inc. | System and methods for serving a custom meditation program to a patient |
US11587559B2 (en) | 2025-08-06 | 2025-08-06 | Apple Inc. | Intelligent device identification |
US10249207B2 (en) | 2025-08-06 | 2025-08-06 | TheBeamer, LLC | Educational teaching system and method utilizing interactive avatars with learning manager and authoring manager functions |
US10188345B2 (en) | 2025-08-06 | 2025-08-06 | Fitbit, Inc. | Method and apparatus for providing biofeedback during meditation exercise |
US10872306B2 (en) | 2025-08-06 | 2025-08-06 | Smiota, Inc. | Facilitating retrieval of items from an electronic device |
KR102656806B1 (en) | 2025-08-06 | 2025-08-06 | ???? ???? | Watch type terminal and method of contolling the same |
US10631743B2 (en) | 2025-08-06 | 2025-08-06 | The Staywell Company, Llc | Virtual reality guided meditation with biofeedback |
US10156775B2 (en) | 2025-08-06 | 2025-08-06 | Eric Zimmermann | Extensible mobile recording device holder |
DK179309B1 (en) | 2025-08-06 | 2025-08-06 | Apple Inc | Intelligent automated assistant in a home environment |
US20170357910A1 (en) | 2025-08-06 | 2025-08-06 | Apple Inc. | System for iteratively training an artificial intelligence using cloud-based metrics |
US11200891B2 (en) | 2025-08-06 | 2025-08-06 | Hewlett-Packard Development Company, L.P. | Communications utilizing multiple virtual assistant services |
US10346401B2 (en) | 2025-08-06 | 2025-08-06 | Accenture Global Solutions Limited | Query rewriting in a relational data harmonization framework |
US10244122B2 (en) | 2025-08-06 | 2025-08-06 | Vivint, Inc. | Panel control over broadband |
WO2018022085A1 (en) | 2025-08-06 | 2025-08-06 | Hewlett-Packard Development Company, L.P. | Identification of preferred communication devices |
US9654598B1 (en) | 2025-08-06 | 2025-08-06 | Le Technology, Inc. | User customization of cards |
US20180054228A1 (en) | 2025-08-06 | 2025-08-06 | I-Tan Lin | Teleoperated electronic device holder |
US10798548B2 (en) | 2025-08-06 | 2025-08-06 | Lg Electronics Inc. | Method for controlling device by using Bluetooth technology, and apparatus |
US10423685B2 (en) | 2025-08-06 | 2025-08-06 | Robert Bosch Gmbh | System and method for automatic question generation from knowledge base |
US9959861B2 (en) | 2025-08-06 | 2025-08-06 | Robert Bosch Gmbh | System and method for speech recognition |
US10855714B2 (en) | 2025-08-06 | 2025-08-06 | KnowBe4, Inc. | Systems and methods for an artificial intelligence driven agent |
US11429586B2 (en) | 2025-08-06 | 2025-08-06 | Sap Se | Expression update validation |
US10365932B2 (en) | 2025-08-06 | 2025-08-06 | Essential Products, Inc. | Dynamic application customization for automated environments |
US20180232920A1 (en) | 2025-08-06 | 2025-08-06 | Microsoft Technology Licensing, Llc | Contextually aware location selections for teleconference monitor views |
KR102384641B1 (en) | 2025-08-06 | 2025-08-06 | ???? ???? | Method for controlling an intelligent system that performs multilingual processing |
WO2018155920A1 (en) | 2025-08-06 | 2025-08-06 | Samsung Electronics Co., Ltd. | Method and apparatus for authenticating users in internet of things environment |
DK3628101T3 (en) | 2025-08-06 | 2025-08-06 | Better Therapeutics Inc | METHOD AND SYSTEM FOR ADMINISTRATION OF LIFESTYLE AND HEALTH INTERVENTIONS |
DK201770427A1 (en) | 2025-08-06 | 2025-08-06 | Apple Inc. | Low-latency intelligent automated assistant |
US10554595B2 (en) | 2025-08-06 | 2025-08-06 | Genesys Telecommunications Laboratories, Inc. | Contact center system and method for advanced outbound communications to a contact group |
CN107423364B (en) | 2025-08-06 | 2025-08-06 | 百度在线网络技术(北京)有限公司 | Method, device and storage medium for answering operation broadcasting based on artificial intelligence |
EP3435642A1 (en) | 2025-08-06 | 2025-08-06 | Advanced Digital Broadcast S.A. | A system and method for remote control of appliances by voice |
US20190122121A1 (en) | 2025-08-06 | 2025-08-06 | AISA Innotech Inc. | Method and system for generating individual microdata |
US11227448B2 (en) | 2025-08-06 | 2025-08-06 | Nvidia Corporation | Cloud-centric platform for collaboration and connectivity on 3D virtual environments |
US11295735B1 (en) | 2025-08-06 | 2025-08-06 | Amazon Technologies, Inc. | Customizing voice-control for developer devices |
US11250336B2 (en) | 2025-08-06 | 2025-08-06 | Intel Corporation | Distributed and contextualized artificial intelligence inference service |
US10963499B2 (en) | 2025-08-06 | 2025-08-06 | Aiqudo, Inc. | Generating command-specific language model discourses for digital assistant interpretation |
US10729399B2 (en) | 2025-08-06 | 2025-08-06 | KUB Technologies, Inc. | System and method for cabinet X-ray system with camera and X-ray images superimposition |
EP4138074A1 (en) | 2025-08-06 | 2025-08-06 | Google LLC | Facilitating end-to-end communications with automated assistants in multiple languages |
KR102508677B1 (en) | 2025-08-06 | 2025-08-06 | ???????? | System for processing user utterance and controlling method thereof |
WO2019183062A1 (en) | 2025-08-06 | 2025-08-06 | Facet Labs, Llc | Interactive dementia assistive devices and systems with artificial intelligence, and related methods |
US20190354599A1 (en) | 2025-08-06 | 2025-08-06 | Microsoft Technology Licensing, Llc | Ai model canvas |
US20200001040A1 (en) | 2025-08-06 | 2025-08-06 | Levels Products, Inc. | Method, apparatus, and system for meditation |
CN110728363B (en) | 2025-08-06 | 2025-08-06 | 华为技术有限公司 | Task processing method and device |
US10769495B2 (en) | 2025-08-06 | 2025-08-06 | Adobe Inc. | Collecting multimodal image editing requests |
US20210398671A1 (en) | 2025-08-06 | 2025-08-06 | Healthpointe Solutions, Inc. | System and method for recommending items in conversational streams |
KR101994592B1 (en) | 2025-08-06 | 2025-08-06 | ????? ????? | AUTOMATIC VIDEO CONTENT Metadata Creation METHOD AND SYSTEM |
US10402589B1 (en) | 2025-08-06 | 2025-08-06 | Vijay K. Madisetti | Method and system for securing cloud storage and databases from insider threats and optimizing performance |
US20200242146A1 (en) | 2025-08-06 | 2025-08-06 | Andrew R. Kalukin | Artificial intelligence system for generating conjectures and comprehending text, audio, and visual data using natural language understanding |
JP2020123131A (en) | 2025-08-06 | 2025-08-06 | 株式会社東芝 | Dialog system, dialog method, program, and storage medium |
US11544594B2 (en) | 2025-08-06 | 2025-08-06 | Sunghee Woo | Electronic device comprising user interface for providing user-participating-type AI training service, and server and method for providing user-participating-type AI training service using the electronic device |
WO2020214988A1 (en) | 2025-08-06 | 2025-08-06 | Tempus Labs | Collaborative artificial intelligence method and system |
US11328717B2 (en) | 2025-08-06 | 2025-08-06 | Lg Electronics Inc. | Electronic device, operating method thereof, system having plural artificial intelligence devices |
US20200342968A1 (en) | 2025-08-06 | 2025-08-06 | GE Precision Healthcare LLC | Visualization of medical device event processing |
US11393491B2 (en) | 2025-08-06 | 2025-08-06 | Lg Electronics Inc. | Artificial intelligence device capable of controlling operation of another device and method of operating the same |
KR20190080834A (en) | 2025-08-06 | 2025-08-06 | ???? ???? | Dialect phoneme adaptive training system and method |
US11501753B2 (en) | 2025-08-06 | 2025-08-06 | Samsung Electronics Co., Ltd. | System and method for automating natural language understanding (NLU) in skill development |
US11461376B2 (en) | 2025-08-06 | 2025-08-06 | International Business Machines Corporation | Knowledge-based information retrieval system evaluation |
US20210011887A1 (en) | 2025-08-06 | 2025-08-06 | Qualcomm Incorporated | Activity query response system |
KR20190095181A (en) | 2025-08-06 | 2025-08-06 | ???? ???? | Video conference system using artificial intelligence |
KR20190099167A (en) | 2025-08-06 | 2025-08-06 | ???? ???? | An artificial intelligence apparatus for performing speech recognition and method for the same |
US11222464B2 (en) | 2025-08-06 | 2025-08-06 | The Travelers Indemnity Company | Intelligent imagery |
US11636102B2 (en) | 2025-08-06 | 2025-08-06 | Verizon Patent And Licensing Inc. | Natural language-based content system with corrective feedback and training |
US10827028B1 (en) | 2025-08-06 | 2025-08-06 | Spotify Ab | Systems and methods for playing media content on a target device |
KR20210066328A (en) | 2025-08-06 | 2025-08-06 | ???? ???? | An artificial intelligence apparatus for learning natural language understanding models |
US11983640B2 (en) | 2025-08-06 | 2025-08-06 | International Business Machines Corporation | Generating question templates in a knowledge-graph based question and answer system |
US11042369B1 (en) | 2025-08-06 | 2025-08-06 | Architecture Technology Corporation | Systems and methods for modernizing and optimizing legacy source code |
WO2021188719A1 (en) | 2025-08-06 | 2025-08-06 | MeetKai, Inc. | An intelligent layer to power cross platform, edge-cloud hybrid artificail intelligence services |
US11995561B2 (en) | 2025-08-06 | 2025-08-06 | MeetKai, Inc. | Universal client API for AI services |
US11521597B2 (en) | 2025-08-06 | 2025-08-06 | Google Llc | Correcting speech misrecognition of spoken utterances |
US11984124B2 (en) | 2025-08-06 | 2025-08-06 | Apple Inc. | Speculative task flow execution |
US11676593B2 (en) | 2025-08-06 | 2025-08-06 | International Business Machines Corporation | Training an artificial intelligence of a voice response system based on non_verbal feedback |
US11550831B1 (en) * | 2025-08-06 | 2025-08-06 | TrueSelph, Inc. | Systems and methods for generation and deployment of a human-personified virtual agent using pre-trained machine learning-based language models and a video response corpus |
-
2023
- 2025-08-06 WO PCT/US2023/010624 patent/WO2023137078A1/en active Application Filing
- 2025-08-06 US US18/095,987 patent/US12346994B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190050686A1 (en) * | 2025-08-06 | 2025-08-06 | Intel Corporation | Methods and apparatus to add common sense reasoning to artificial intelligence in the context of human machine interfaces |
US11107465B2 (en) * | 2025-08-06 | 2025-08-06 | Storyfile, Llc | Natural conversation storytelling system |
WO2020136615A1 (en) * | 2025-08-06 | 2025-08-06 | Pankaj Uday Raut | A system and a method for generating a head mounted device based artificial intelligence (ai) bot |
WO2020247590A1 (en) * | 2025-08-06 | 2025-08-06 | Artie, Inc. | Multi-modal model for dynamically responsive virtual characters |
Non-Patent Citations (1)
Title |
---|
BOGDANOVYCH ANTON, RICHARDS DEBORAH, SIMOFF SIMEON, PELACHAUD CATHERINE, HEYLEN DIRK, TRESCAK TOMAS, WU JASON, GHOSH SAYAN, CHOLLE: "NADiA : Neural Network Driven Virtual Human Conversation Agents", PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, ACM, NEW YORK, NY, USA, 5 November 2018 (2025-08-06), New York, NY, USA, pages 173 - 178, XP093079351, ISBN: 978-1-4503-6013-5, DOI: 10.1145/3267851.3267860 * |
Also Published As
Publication number | Publication date |
---|---|
US20230230293A1 (en) | 2025-08-06 |
US12346994B2 (en) | 2025-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112074899B (en) | System and method for intelligent initiation of human-computer dialogue based on multimodal sensor input | |
US20230018473A1 (en) | System and method for conversational agent via adaptive caching of dialogue tree | |
JP6816925B2 (en) | Data processing method and equipment for childcare robots | |
CN111801730B (en) | Systems and methods for artificial intelligence driven auto-chaperones | |
US11468894B2 (en) | System and method for personalizing dialogue based on user's appearances | |
CN112204564A (en) | System and method for speech understanding via integrated audio and visual based speech recognition | |
US11003860B2 (en) | System and method for learning preferences in dialogue personalization | |
CN112204565B (en) | Systems and methods for inferring scenes based on visual context-free grammar models | |
US20190251701A1 (en) | System and method for identifying a point of interest based on intersecting visual trajectories | |
US20190251957A1 (en) | System and method for prediction based preemptive generation of dialogue content | |
US20190251716A1 (en) | System and method for visual scene construction based on user communication | |
US20220215678A1 (en) | System and method for reconstructing unoccupied 3d space | |
US10785489B2 (en) | System and method for visual rendering based on sparse samples with predicted motion | |
US12346994B2 (en) | Method and system for virtual intelligence user interaction | |
WO2021030449A1 (en) | System and method for adaptive dialogue via scene modeling using combinational neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
百度 霍邱县回复网友关于垃圾整治问题时表示,为了打赢农村“三大”革命,将实现全县垃圾治理全覆盖。
Ref document number: 23740636 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 23740636 Country of ref document: EP Kind code of ref document: A1 |